Code Your Own Music Streaming Service
It's actually really quite simple, and may soon even be almost legal (I'll get back to that point another time). Almost all media players can stream over HTTP, the primary protocol of the web, so it's just a matter of making the files available. You can just have your music collection in a folder that in available to the internet, and go from there. But, I decided to do it with slightly more finesse.
The setup is on my dear Eliza, one large chunk of music and a pretty standard LAMP1 stack. The database is superfluous to the simple task at hand, as all the data is in the file system
Firstly, I wanted to make sure that all my actual media files were outside the document root of the website. The music needs to be readable by the web server - this was achieved by adding the web server's user to the music group on the machine. This also meant that if anyone tried injection attacks to, say, view folders outside the music folder, the web server would be unable to read them, and not return anything. The planned code was only three files - one of which I am still yet to implement - and a folder for cache purposes. A few assumptions were made - such that the music files' extensions were entirely representative of their content types - but, as data cannot be uploaded and is already being streamed via SMB to a windows box, these assumptions are within the bounds of the projects specification.
The first section of code is the actual streaming code. This stripped down version is only for FLAC2, but extending it is just a switch on the extension of the file.
<?php
$source = '/mnt/data/music' . $_SERVER['PATH_INFO'];
if (!file_exists($source) || !is_readable($source))
{
header('HTTP 1.1/400 Bad Request');
echo '404 Not Found: ' . $source;
exit;
}
$cache = './cache/' . md5($source);
if (!file_exists($cache))
{
$file = fopen($cache, 'wb');
flock($file, LOCK_EX);
// flac (to standard out) (decode) (flac file) -> lame (compression 6) (std in) (output to cahce)
$command = "flac -c -d \"{$source}\" | lame -q 6 - \"{$cache}\"";
exec($command);
flock($file, LOCK_UN);
}
else
{
$file = fopen($cache, 'rb');
}
$expires = 60*60*24*14;
header("Pragma: public");
header("Cache-Control: maxage=" . $expires);
header('Expires: ' . gmdate('D, d M Y H:i:s', time()+$expires) . ' GMT');
flock($file, LOCK_SH);
header("Content-Type: audio/mpeg");
header("Content-Transfer-Encoding: binary");
header("Content-Length: " . filesize($cache) );
readfile($cache);
flock($file, LOCK_UN);
fclose($file);
So, things to draw attention to. Let's start at the beginning with the PHP tag. Yes, this is all of the code - I have deliberately left the closing tag off. This is because my habits mean that the last character in a file should be a newline. I have has some issues with mismatched line endings and older versions of PHP, which have made not including the end of PHP marker when I don't need it - the interpreter assumes it on reaching an end of file, and it removes the possibility of accidentally sending the headers or the data prematurely.
Next: $_SERVER['PATH_INFO']
, a most useful parameter which saves a lot of messing around with mod rewrite. Apache is reasonably intelligent in its file handling - given a path, it works left to right along it, matching things form the webroot. If it's encounters a file, rather than a directory, it treats the request as a request for that file, with the rest of the path being auxiliary information - which PHP can get from the server super-global variable. Then, we check that we can find the original music file and, if we can't, send back a very basic 404 page.
Now, I don't particularly want to try and stream FLAC at the better part of a megabit per second, we need to perform some compression on the music. However, compression takes time, and I don't want to have to hang around too long for my music to be generated, as well as buffered. So, a cache folder is made for storing the songs, designed essentially as a file-system hashmap. The hash is the MD5 of the music file's path (assumptions are being made on hash uniqueness, but that's another discussion). If the cached file exists, we don't need to regenerate it.
There are some concurrency issues here, whereby two near simultaneous requests for the same non-cached file may result in one client receiving nothing. PHP's locking system is entirely advisory, so I've had to first make the assumption that this service is the only thing using the cache - which one would hope to be true. It, unfortunately, doesn't allow you to lock a file before you create it, so the file is created and then an exclusive lock is gained. AS this operation is non-atomic, another reading thread can find the file existent, and lock it before the writer thread acquires a lock, leading to the reading a blank file issue described above. Otherwise, the writer locks the file, blocking all readers. Once it's written the cache file, it can be read by multiple different threads at the same time. The conversion code is somewhat alchemical, but basically uses the FLAC executables to decode the file to a raw wave stream and then have the shell pipe this as the input to lame, one of the more common mp3 encoders3.
So, once we have our file (note that the writing thread releases its exclusive lock and then acquires a shared lock later on), we can begin to start outputting data to the client. First, we set up the client's caching parameters - these files aren't going to change at all often, so there's no point in the client re downloading it every time, if it can avoid it. The exact headers used for caching are a bit of a mess, as different browsers (and other applications) implemented different versions of different standards. Then we give some information about the content - a shared lock is acquired to make sure that the file has been written to, and that the file size count is up to date - such as the format, size, and the fact that the client should most definitely and explicitly handle the data as a byte stream and not nice ASCII/UTF8 characters.
Then, we read the file. readfile()
passes the contents of the file directly to the output buffer, making this operation trivially simple. Then, like all good programmers, some cleaning up is done explicitly - the file lock is released, and the file handle is closed.
The other script was just a very quick browser, built on tame same principles, and the PHP in built directory listing functions. The only difficulty is checking that the absolute resolved pathname falls within the music folder although, for the most part, that is covered by the file permissions. Only files with known extensions were shown, for simplicity's sake.
Now, there's a lot you can do form here. Through an AJAX library and an iFrame in, and you can build your own version of Grooveshark pretty damn quickly :)