Home

20 entrées précédentes

13 mar 2008

purple WH

The Role of Recommendations

So, on blogging, I was reading an post by Kurt Jacobson on avoiding mob rule in determining music quality. I have thoughts on the issue and want to have a conversation. I don't actually know the guy, so through the medium of blogging we just both shout our ideas out into the ether.

The basic idea he's hitting on is social proof. It's essentially the fact (and I say fact because there's enough research to make it pretty much indisputable) that human perception and preference is highly malleable, and one of the biggest influences is what they see other people doing.

In music recommendation this played out in a recent study on recommendations in an artificial culture. One group was given a bunch of songs and asked to rank them. Another group was given the same task, but they were given feedback on how much other people liked the same songs.

The results from the first group were fairly consistent. This suggests that there are some inherent qualities of songs that are consistent across listeners. (The test was conducted on a "teen-interest" site, so there was probably some perceptual preference similarity among samples.) With the introduction of social influence, scores became much more erratic and predictability dropped.

Jacobson's argument is essentially Britney Spears is crap and if we let the mob decide what good music is, we'll have a sort of dysgenic collapse as the masses all follow each other into a collective mediocracy.

There's certainly something to that argument. The consistency in preferences without social influence says that there is some sort of inherent quality to music. (Whether this quality would hold up under a more diverse sample population is debatable.) If I am wanting to recommend a song to someone who has none of the social feedback, I am going to want to use the consistent sample set because I can predict with less error that they're going to like the song.

The idea though that technology should somehow try to correct for humanity's herding instinct is philosophically dangerous and ultimately counter productive. Most people don't want you to correct for their weaknesses. They want you to give them what they want. In fact, the bulk of people would find the technologist's assertion that they need their preferences accommodated for slightly insulting.

One of the things I would have liked to see in the study was means for the data. They put data in terms of market share which is interesting, but I would like to know if the average rating was higher. There was the recent study on how a $2.50 placebo is 25% more effective than a $.10 one. I wouldn't be surprised if the user enjoyment of music with high community approval was higher than when the song was heard in isolation. It isn't just that people take cues from their environment, but that their actual experience is different.

So instead of trying to somehow fight against social proof, instead use it and try to introduce systemic constraints to encourage novelty.

Imagine a site where people nominate songs for song of the day. A song of the day can never have been a song of the day before. (Or to have more variety, disallow the top 10 each day from future consideration.) The criteria for deciding the song for the day is based on votes from all the users. There's some sort of mechanism where the more often a user is correct the more their opinion counts. I'd have to think on the details, but you essentially develop experts within a particular genre whose natural preference profiles are very similar to a stereotype. This would bias the results toward reproducibility.

The details would need some thinking, but the basic idea is you want to use social proof to increase a user's enjoyment while somehow encouraging the introduction of novel ideas. There's already a hugely implemented system dealing with this exact problem: the stock market. You want to find a resource before anyone else knows about it (find novel music) and after you've invested in it, you want to see it perform as well as possible (have as many people rank it as possible). I think a site based around that concept would be interesting.

I don't disagree that there is such a thing as great art. I'm just not convinced that computers are capable of recognizing that on their own. The best that we can do is set things up so that hopefully the human capability to recognize beauty is captured as much as possible and leveraged.
Tags:

06 fév 2008

purple WH

Why I Like MusicBrainz

Recent years have seen an increasing amount of interest in music recommendations. Music is being created and published at a rate that far outstrips the ability of any listener to keep up with. How then, if I'm an artist, can I get my awesome new song to the ears of listeners? Or, if I'm a music connoisseur, how can I sort through the thousands of songs to find the few I'll love?

The oldest recommendation system is still going strong: word of mouth. "Dude, you've got to hear this…" will always be a vector for music to spread. Another old-school system is one where I pick a bunch of songs I like and send them out over the airways to anyone who would like to hear. The internet is changing the ballgame, however. The advent of computers, in addition to fundamentally altering how people create and distribute music, also promises to change how they discover it.

Some of the earliest systems were what are known as "collaborative filters." A computer system, like Amazon, knows thousands of albums that various people have purchased. Assuming that people liked the albums they bought, if Bob and I both bought ten of the same albums, when Bob buys an 11th, there's a better than average chance that I'll like that album as well.

Systems like Pandora work in a similar way, but not only have they broken things down from albums to individual songs, they also look at specific characteristics of songs. So, now the system can say, not only, "people who liked the songs you like also liked…," but also, "you have liked 9 of 10 R&B songs I played for you with melodic tones and three male singers, here's another song that sounds like those."

Pandora's knowledge of music comes from the Music Genome Project. Human listeners have spent countless hours reviewing "tens of thousands of artists" and cataloging God-only-knows how many characteristics. That's cool, but not really an undertaking the average person has time for, so there's quite a bit of research into how to get computer programs to listen to the songs for us.

Systems employ a variety of different types of signal analyses, most taking into account human perceptive characteristics (we are more sensitive to high frequency noises than low ones). They not only attempt to determine traditional characteristics such as beats per minute, key, and timing; they also represent "timbral characteristics," like the signal strength 3-4kHz, that human listener would never use to describe music.

There's quite a bit of research in this area including the International Conferences on Music Information Retrieval and Related Activities, the Laboratory for the Recognition and Organization of Speech and Audio at Columbia, and the Music Technology Group at the Universitat Pompeu Fabra. Research is also driven by a growing rapidly group of private commercial interests too numerous to list.

The more interesting developments for me, a solo programmer and graduate student in an unrelated field, are open-source projects. These are areas where I can take part and help create something interesting. One of the more promising is Paul Lamere's Search Inside the Music project with Sun Microsystems.

Which brings us to a crucial issue for every computer program ever written: data. Systems like Last.fm will integrate with your music player so that every song you play goes toward helping the system build a profile of preferences. Accessing actual songs to play with spectral analysis on a large scale is a bit more complex. The options are fourfold:

  1. Make a deal with the record companies gaining some sort of license — Unlikely
  2. Buy tens of thousands of songs online — I'm poor
  3. Use creative commons sources such as Magnatune — Limits selection
  4. Steal as much music as the internet will allow — Illegal

Each of these has advantages, but ultimately with all of them I run out of space to store all these songs. Also, part of the point is community building, and the point at which I try to open the system to other people (other than with the creative commons sources), I'm in legal trouble. What would really be nice is if I had an online system that I could query about the characteristics of songs (or a database of info I could download). The tricky part is building that database.

In the last couple weeks, a couple systems have come online (EchoNest & Vienna MIR) that let you send in a song and get back an XML description of the audio characteristics. That's a fine idea, but it limits the songs I can do computations on to songs I have a copy of. It also means I need to transfer whole files over which means if I want to experiment on a 5,000 song set, I need to start about a week early just so I can send all my songs over.

Right now I'm listening to The Nature by Talib Kweli. If someone else has already analyzed this song, I don't need to send the song over. I just need to tell the system I want The Nature and it can send me the data I want. This saves both the time to transfer the song and the time to analyze it.

Now the problem becomes how to tell the system what song I want. For many rap and folk bands there are often myriad versions of a particular song. Lots of new XML based technologies use URIs, and that makes sense, but how to make a unique one for each song? Enter MusicBrainz. They are a huge (more than 6,300,000 tracks and growing) community maintained database of songs, and they've assigned a unique identifier to each one. If I know the MusicBrainz id for a particular song then I can refer unambiguously about it to anyone else.

MusicBrainz has some limitations that would eventually be a problem. Mostly that they are "album-based" meaning they don't deal with DJ mixes or bootlegs. Also so far as a completely open process, they use MusicDNS which is partially encumbered. That would matter eventually, but those concerns are outweighed by the huge amount of data and community support you'd get.

There may well be other good solutions to the data problem. I like the MusicBrainz idea because it gives a project like Search Inside the Music a pre-built community and support structure. It would also draw users to help build MusicBrainz as a resource for both tagging and music recommendation applications.

Tags:

26 jan 2008

purple WH

Eureka

This is complete geekitude, so feel free to disregard…

I want to take a DVD, stick it in the drive and at some point have the smallest file possible that contains as much of the original structure as possible.

There are a few problems with most of how this is done most of the time. When you download a video off the internet there are several parts that go into that file. There's the sound and the video streams and then a container that joins them all together. The container syncs the streams and says, this second of audio goes with this second of video.

The most common container format in use currently is AVI from Microsoft. AVI is a fairly old format and it has several limitations. One of the most important for my purposes is it pairs one video with one audio track. That's fine when you're encoding your home movies, but it can't handle dubbing and director's commentaries for DVDs.

There are a couple new formats to consider, but the one I like the most is Matroska. It's flexible and supports not only the multiple audio streams but also multiple subtitles.

Inside the container there are even more standards at work. The raw data for audio waveforms and video images is huge. Different compressions schemes, called codecs, are used to compress the data and make it take up less space. They operate in a variety of ways, but ultimately the goal is a tradeoff between the degradation of the image or sound quality and the size of the output file.

Audio compression has seen a bit of improvement over the old workhorse MP3, but the real improvements in technology in recent years has been in the realm of video compression. H.264 is able to produce a file that is nearly visually identical to the MPEG-2 used on DVDs in fraction of the space.

There are some programs, notably HandBrake, that do a pretty good job of doing what I want. Before HandBrake was released however, I started on a ripping program of my own. I've been playing with it for a bit and this morning I had a bit of a breakthrough.

When I list the contents of the DVD, it tells me in general that the video is in NTSC. That makes sense, since NTSC is what we use here in America. The encoder was complaining though about dropping frames which shouldn't happen if I was encoding at the right rate.

It turns out that though NTSC is 60000/1001 frames per second and interlaced every other frame NTSC is 30000/1001, movies are filmed at 24000/1001 and that is how they're stored on progressive DVDs. In running my encoding previously at 29.97 and ending up with a file around 1500mb. I switched to 23.976 and the file size dropped by 500mb.
Tags:

29 juin 2007

purple WH

(pas de sujets)

Between Wayne, my mom and myself took about 2000 pictures over our two weeks in Peru. They were all named random stuff like IMG_1985.jpg. I figured to get them sorted, I'd name them according to their dates. Once I got it done I discovered that Wayne's camera is off by 12 hours. After I did a little research I found that there's a handy Linux utility that's exactly what I need:

jhead -ta-12:00 -model SD500 *jpg

To do the renaming then is:

jhead -n%Y:%m:%d_%H:%M:%S *jpg

Tags:

28 juin 2007

purple WH

Photos

Gallery has died again. It's getting kinda irritating since it tends to do so about every month or so. I don't think it is really the gallery program's fault entirely. I suspect that Dreamhost is cutting the processes off early and it isn't handling it well.

It's a pain not only because of the outage but also because I have to reload everything. There's too many pictures though and it chokes. Even after loading them incrementally, I still lose the captions. (Version changes kept me from restoring the database.)

Flickr is cool. I like the geotagging and community bits. I'm tempted to drop the $25 to create an account for a year. (Since it will take a while to upload 10gb of photos 100mb at a time. Though smugmug looks pretty good, but it's $40. There's also Picasa which would be $25 for 6.25gb.

Honestly I'm frustrated since I have hosting. It's kinda sad that there's not a FLOSS offering that works well. I love the idea of trying to write one myself, but who's got the time? Maybe someday.
Tags:

02 mar 2007

purple WH

Feisty Fawn

As I was working on learning MySpace, I did a bit of Googlebaiting and found out that a little bit of python I wrote a while back is in Ubuntu. That's pretty cool.
Tags:

04 fév 2007

purple WH

Damned Circuits

I may have figured out why the hell my simple microcontroller project won't work.

If you look at this schematic, the one provided by the project, you'll note that the serial port driver (the smaller chip) is a MAX232. The capacitors connected to it are 100 nF = 100x10-9 F = 10-7F = .1µF. .1µF capacitors are what are used by the MAX232A. The MAX232 uses 1µF capacitors. I've spent many hours trying to get this damn thing to work. If this turns out to be the issue, I don't know if I am going to be more irritated of relieved. I definitely want to try and replace the existing project because it has repeatedly not shown the sort of meticulousness I would expect from someone working with electronics.

On a related note, I've been reminded of a project I have wanted to do for a long time which is take the IC Masturbator and do a semantic capture of the information.

The Masturbator is the pin outs of hundreds of chips, but it is all little ascii diagrams. I'd like to design an XML chip description language which could be used to drive a better site and generate prettier diagrams and truth tables:

I think the popularity of sites like instructables.com show a base of interest that would make it worthwhile. The next step would be to use the SVGs in a schematic design program because I've not used one yet that isn't crap.

As always, the issue is time… Never enough time.

Tags:

11 jan 2007

purple WH

Circuitry

I really want to make some lights blink. I think it would be amazingly cool to make little electronic gadgets and everyone knows breadboards and soldering irons are the sexiest bits of geek swag one can lay their hands on. ☺

I have been trying to break into this world for years now. I started back in college when Steve gave me this USB dohickey that went with his monitor and had a couple buttons for controlling volume and whatnot. I tried writing a kernel module to try and interface with it. The thing is, I'm pretty sure it's broken. There is some initial setup stuff that it should do when it's plugged in and I'm getting errors saying it isn't doing that.

Attempt #2 was to do the USB-IR-Boy project. This is just a little sensor to let me control my computer with a TV remote. I'm currently stuck. I've got a firmware image for the microcontroller (μ), but I can't get it to transfer.

Part of the issue is that the μ I'm using is the HC08 and it is less common in hobbyists' projects than PICs and BASIC Stamps. There's less support for it and many of the tools I am using are more than five years old (which is pretty old for software).

I've been using Spgmr08 and a simple programmer to try and flash the μ. I'm getting nothing and so I'm going to try the PKG08SZ software for Windows.

If that doesn't work, I'm thinking I'll dump the HC08 for the AVR. There's a GCC port and firmware USB implementation that look like they would get me most of the way toward building my receiver. So far as programming, there's a $79 starter kit that looks great to me.

Tags:

08 jan 2007

purple WH

Life and Times

Last June, [info]get_brett, facing the eventual end of his work on Gears of War, resolved to be more dedicated in his work on programming projects.

To [info]the_archange1, [info]titivillus and myself he tabled the proposition of a productivity list where, sort of like a dieting support group, we would send regular status reports about our various projects. We gave it a shot and though weekly updates didn't happen, there have been a couple hundred messages between the four of us over the last six months. Though the accountability isn't all that strong, it has definitely been nice having people to send stuff to when I make progress on something and am proud of myself.

This last week we all revisited our project list from when the list started and covered where exactly we were on various things. I'm generally a spaz with the attention span of a ferret on crack and it was actually nice for me to look at something and see that though I do suck horribly at accomplishing things, I have actually made some progress.


himinbi.org: I want a nice looking portfolio site to show off the fact that I know my ass from a hole in the ground. It may or may not be responsible for helping me land a job, but it is sure as shit not going to to hurt.

This site looks identical to how it did when I wrote this. Wayne and I did register madstones.com and his logo work has been going promisingly. The next bit for this one I am doing for work: odin.himinbi.org/filmstrip/

Getting Schooled: I want to at least get my masters and, if I can get into a good program, my Ph.D. I've got the problem right now of a strong interest in electronics, but insufficient experience to know if it is something I seriously want to pursue professionally.

I got the parts for this project and wrote a little code, but wandered off and forgot much of what I was doing. McK got me a subscription to Nuts and Volts for my birthday and I got a new issue a couple days ago. It inspired me to pick this one back up, but his time I'm writing down the concepts so than when I wander off and come back I can relearn faster. odin.himinbi.org/frontpanel/

Stories: The National Storytelling Festival is in Jonesborough the first weekend in October. It is a little expensive to go to the actual festival, but I want to bring some people down, ride the Creeper Trail, hang out at the cabin, go to Midnight Cabaret and maybe go spelunking. This project only has one real technical component: riddlecreek.holcomb.info. I want to use the cabin as a selling point and the site doesn't really exist currently. This also relates to possibly renting the place in the future as well.

We did the trip, but I didn't finish the site.

pcvs.org: There are all sorts of scripts and pages and stuff that I wrote as a volunteer that would be really useful to current volunteers. The problem it was all hand edited XML files and programs I ran. I'd like to migrate this into a CMS and I've got an authentication scheme I'd like to see if I could get to actually work. I think that this site would be reasonably popular (50,000+ hits/month) if I could get it working like I want. It is also my testbed for evaluating any CMS/publishing system.

This was the one I did nothing at all on. I want to play with a CMS idea and because of the size of that I'm leaving it until I don't have a job as a distraction.

All in all, I did about how I would have guessed I would. I did lots of little bits, flitting from project to project, but not really finishing anything.

Tags:

02 nov 2006

purple WH

Don't Mess Up

I've been screwing with my livejournal layout. I'm about 75% of the way through, but I generally am liking how it looks.

I'm still contemplating on the friends page. I've got a little script that creates a SVG and rasterizes it. It makes pretty curvy corners, but Dreamhost is slow enough that some of the corners time out. It makes things kinda ugly.

One of the bigger problems at this point is I've deleted the edit button from the post page so if I screw up there's no fixing it. Here's to being carefull.
Tags:
purple WH

Phones

The election is fast approaching. I have, unfortunately, found that it takes about a month to get an absentee ballot, so my voice will remain unheard this year.

Things have been heating up here in the office. There's lots of ads and whatnot. It is going to be an interesting election, though things look bad for several of our initiatives.

90% of my time has been sunk into learning Trixbox (formerly Asteisk@Home). We were getting line errors and dropped calls on our PRI line from XO, so it was decided to drop them. The problem is the guy who knows how all this works isn't here. So far as Linux admin experience, I'm the best they've got.

Not to say that I'm a bad admin. I've been running Linux as my only OS for about two years now and I like it. The issue at hand is I know diddly about VoIP.

It has been an instructional week. I like Asterisk in general and would definitely like a chance to roll out an installation (though with a slightly longer timeline). I've felt very professional calling techs all over the place and working to get things set up. It was sort of anti-climactic today when the Verizon tech and I finally got things going and when I asked him what he did and he said, "everything just needed to be rebooted." Apparently that works in the telecom world as well.
Tags:

28 sep 2006

purple WH

Crackers

Well, my internet is busted. Specifically, I think the WiFi access point of the guy I sublet from is defunct. So, to wile away my internetless hours I've been learning how to get into my neighbors' networks. The process is pretty simple. A little bit of terminology first:

If you are familiar with wired ethernet sniffing, you know the term promiscuous mode. An ethernet is like a room full of people all shouting at the same time and your computer just ignores anything not specifically destined for it. When you go into promiscuous mode you start listening to anything happening on the line. If people are sending passwords and whatnot without using encryption, you can just read them out of the traffic. (So I've heard. I'd certainly not know if any of my English teachers back in college had the password "60retire".)

Well, a wireless network has the concept of promiscuous mode as well. It isn't what you want though. Promiscuous mode will give you information about all the computers connected to the network with the access point that you are using. What is interesting in this situation is all the computers that are broadcasting, but which aren't on your network. To get those packets, you need to go into monitor mode.

The 802.11 protocol allows for sending out probe requests to which an access point in broadcast mode will respond. Netstumber uses this method to detect networks. This is mostly useful in wardriving when you're not going to be around long enough for an actual broadcast. Since I'm sitting in one place, I don't really need that. Another advantage is that some access points aren't in broadcast mode and a wireless card in monitor mode will detect those as well.

My card is a 3com 3CRDAG675 and I'm using the madwifi drivers since support isn't built into the kernel. The drivers aren't supported by most of the sniffing programs, but all I have to do is put the card in montor mode manually. (This took me a while to figure out, so I'll note it here. Assuming that the card is already up and running, do:)

wlanconfig ath0 destroy
wlanconfig ath0 create wlandev wifi0 wlanmode mon
ifconfig ath0 up

Then I start up Kismet and leave it running. Unfortunately, none of my neighbors seem to be BitTorrent fans, so traffic has been coming in pretty slow. According to an excellent article on WEP cracking, I'm going to need about 2gb of traffic. At my current rate, I ought to have that in about three months.

Because the analysis being done is statistical, different tools have different rates of success. Kismet dumps all it's logs in /var/log/kismet and I'm running AirSnort and Aircrack. From reading a comparison of WEP crackers however, it really looks like WEPLab is the way to go.

Tags:

20 sep 2006

purple WH

webmastery

Trying to fix some search problems with MPP's site today, I learned about Google's Webmaster Tools. It shows your search terms used for your sites and how high they ranked. My most popular search term? I'm #2 in Google Images for dirt.
Tags:

15 aoû 2006

purple WH

What Little Wills are Made Of

So I went to the Y and had my fitness tested. I failed solidly with a 36/100.

I don't feel too bad about it:

  1. I couldn't touch my toes because I've got sciatica and there's not too much to do about it.
  2. My cardio test showed me to have a high heart-rate, but I did it right after I did 35 push-ups and 60 sit-ups. (There was someone else going at the same time, so I had to do the strength bit first where you do as many of each as you can in a minute.)

The bit I really wanted to find out was my body composition. I'm 30lbs of fat and 165lbs of other stuff. That's 15% which is a little better than I expected it to be. My goals are fourfold in this new gym bit:

  • 10% body fat (That's about there the abbies start showing up in guys. It's around 14% in women.)
  • bench press 195lbs (That's what I weigh and it seems like I ought to be able to bench myself.)
  • curl 100lbs 15 times (That's what McK weighs and I want to be able to throw her if she continues to sass me.)
  • touch my toes (Because I'm twenty-fucking-seven years old.)
Tags:

12 aoû 2006

purple WH

Why oh Y?

MPP has a deal with the Y where, as a non-profit, they get 30% off all fees, so McK and I joined up. We went three times last week, and it was as unpleasant as starting back to the gym generally is. Yesterday was really interesting though: we had our introduction to the Fitlinxx system.

They've got two rooms full of equipment with little touch screens next to each one. You sit down and enter your pin. Whenever you do a set, it tracks the speed of your time up and down, and it counts for you. They've got some treadmills which will track your progress and there's a little kiosk where you can enter cardio stuff that you do on your own. If you graduate to the free weights then you have to write down your sets and you leave it for a trainer who enters the numbers for you. From talking to the guy, he seems to watch people who are working out and leaves little training notes for them in the system.

As I was looking up Fitlinxx to link to them, I found all their data syncs online. Once I made myself an account, I can do stuff like see my stats and manually enter workouts. That's pretty cool. They show your stats as a function of everyday objects. In my workout yesterday, I lifted a total of 15,110lbs or five of the new Beetles. That sounds much more impressive than the 35 gummi bears I burned off on the elliptical trainer. ☺

Tags:

10 aoû 2006

purple WH

Intrusion

I'm on the DC LUG listserv and someone mentioned getting hundreds of failed attempts to connect to the computer via SSH. I thought I'd take a look and see if I was having anything like that. I took a look at /var/log/secure and in the last three weeks there have been over 14,000 failed connections. michael, christopher, matthew, joshua, jacob, andrew, daniel, nicholas, tyler, joseph, david, brandon, james, john, ryan, zachary, justin, anthony, william, robert, jonathan, kyle, austin, alexander, kevin, cody, thomas, jordan, eric, benjamin, aaron, jose, christian, steven, samuel, brian, dylan, timothy, adam, nathan, richard, sean, charles, patrick, jason, luis, … It just goes on and on.

This clearly won't do. It is highly unlikely this person is going to both guess my username and my password, but I just don't like that it is going on. So, I added these lines to the iptables:

iptables -I INPUT -p tcp --dport 22 -m state --state NEW -m recent --set

iptables -I INPUT -p tcp --dport 22 -m state --state NEW -m recent --update --seconds 60 --hitcount 4 -j DROP

So, any more than three attempts to establish a connection to SSH within a minute period will cause any further connection attempts to drop. If I understand the code correctly as well, the count doesn't reset until a minute has passed, so if they keep hitting it constantly, it will never stop dropping them.

If they're smart, they could figure out it is resetting after a minute since rate-limiting isn't terribly creative. Pulling off a brute force attack with one attempt every twenty seconds will certainly take a while.

Tags:

24 juin 2006

purple WH

(pas de sujets)

I'm just curious if my new paid account will strip out non-HTML tags:

Proof that Girls are Evil

  1. Girls require time and money

    = t x $
  2. Time is money

    t = $ = $ x $ = $ 2
  3. Money is the root of all evil

    $ =
  4. Quid Erat Demonstrandum

    = $ 2 = 2 =

Well, it leaves the tags, but I can't serve the page as application/xhtml xml, so the MathML doesn't render.

How about forms?

It looks like I still can't do scripting. C'est la vie.

Tags:

23 juin 2006

purple WH

Formattage

Broke my journal. Will fix it soon when I've got just a bit more time. Damned StumbleUpon.

  1. Went looking for a Firefox extension to show me HTTP traffic.
  2. Saw the StumbleUpon extension and installed it along with Tamper Data.
  3. Got stuck on trying to replicate a POST request and started piddling around with StumbleUpon.
  4. Ended up at last.fm which is sort of like Pandora except music recommendations are made on user submissions rather than an external sorting system: very Web 2.0.
  5. Music has been really big for me as of late, so I created an account and installed the monitoring agent, Audioscrobbler.
  6. This allows for the creation of a pretty playlist, so I thought I'd stick that in my journal.
  7. It was the process of playing with that, that I killed my journal.
  8. Since I spend a minimum of $20 when I go out drinkin', I went ahead and got a paid account.

All I need now is time to fix the site. You'd not think that I was a professional web designer to look at my sites.

Tags:

11 juin 2006

purple WH

Cuteness du Jour

Rabbit In Love

These sidewalk drawings are also awesome. Definitely going on this list of things to try and use when I try and make my house the coolest place on Earth.

Btw, I'm lovin' on BitTorrent. I'm downloading the Fedora Core 5 ISO (so I can upgrade Amarok to support m4a). I've been pulling between 600 and 700 kilobits the whole time. It's going to take about an hour and a half to download the 3gb file. If it were a DVD, I could be watching it in real time.

10 juin 2006

purple WH

H.264

More geekery. Today's fun centered around videos and trying to get the damned things to play. The good songs DVD is part of a larger project to send a bunch of stuff off to Mauritania. Another bit of that is a couple DVDs, one with Scrubs and another with Coupling.

Through the magic of BitTorrent, I have all the episodes of both, but Coupling will only play in Xine. This isn't a huge problem for me since I'm a geek and have Xine on my Linux box. It is a problem though for the non-geek people I'm sending this to. So, I needed to fix it. The only problem is I know diddly about video stuff.

Well, the file extensions are mkv, so I started there. It turns out that mkv stands for "Matroska video." When you watch a video on your computer you are not just watching, you are listening as well. The computer needs information about how the audio and the video sync up in order to play correctly. Matroska is a format to describe that. Another container format that I'm sure most everyone has seen is Microsoft's avi format.

In looking on the Matroska site, VLC is one of their recommended players and MPlayer is supported as well and neither works. So, more digging is in order…

I found some tools for manipulating mkvs, and mkvinfo let me know that my files contained an Ogg Vorbis sound stream and a H.264/MPEG-4 AVC video stream. (Matroska looks to be a pretty intriguing format, supporting things like multiple audio channels and subtitles.)

This was interesting as as H.264 is pretty new and the only place I've seen it so far is in IPTV podcasting on hacker sites like Hak5 and FTS. I was pretty sure this was where the problem since Ogg is old.

Well, I wandered the internet trying to find someone else having a similar problem, but to no avail. So then I decided to reencode the files. If you've ever seen Multiplicity, you are familiar with the "copy of a copy" principle. Every time, some data is lost and you never do anything but get further from the original. I was pretty much stymied, so I figured I'd bite the bullet.

None of the tools on my Fedora Core 4 machine could encode H.264, so I decided instead to encode to MPEG-4. It took me a bit to figure out how to get FFmpeg to do it, but eventually, I got it working.

Lesson #1. Encoding video takes for freaking ever. The MPEG encoding is a two pass process where the first pass collect some statistics and the second uses those to encode the file. These episodes of Coupling and about 50 minutes and 688px x 400px. On my 1.7gHz processor it takes about 25 minutes per pass.

Lesson #2. Whoever created these files did some magic. The original file, which looks really good, is 205mb. My MPEG-4 was over 400. This is not an acceptable increase. If I hadn't seen how little they could be, I'd probably accept it since I expect a two hour movie to be around 1.5gb. Since I know it is possible to do it in less space though, I am bound and determined to match that.

I'm currently running the video through x264 to see if I can't get some better compression. In this one each pass takes about an hour and a half, so I've got some waiting yet to do. I'll post the result.
Tags:

20 entrées précédentes

purple WH

juin 2008

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
2930     

Advertisement

Syndicate

RSS Atom
Actionné par LiveJournal.com