Tearing Apart My Little Streamer
Or, an attempt to educate in the ways of the almighty Hyper Text Transfer Protocol.
A while ago, I published a post on How to write your own music streaming service. What I'm not going to do is go over some of the things I did wrong or failed to do, which feel within the scope of the code - the assumptions I made beyond the code's control were referenced in the original post.
Cache Collision
The system by which the local cache is generated makes the assumption that there will be no MD5 collisions in the space of all the file names in use by the system. Now, if you make the assumption that MD5 has a equal chance of using each of the 2128 possible results, you can work out the probability of a collision based on the birthday problem. However, any problem where the probability is not provably 0 should be checked for - in this case, by something as simple as appending the title of the file. Of course, this might change (even I occasionally make tagging mistakes), causing a version to be left in the cache indefinitely.
No HTTP Modification Headers
The HTTP specification1 defines a lot of different standard headers. Quite a few of them make reference to the HTTP Status Code 304 Not Modified, which allows you to state a file whose cache time has expired, and has been re-requested, is still the same, and save having to send again.
This uses the E-Tag
, of the If-Modified-Since header
for performing time and content matching - both of which would be easy to implement in this case (native md5sum for the e-tag). There are also some other headers which you can make an argument for setting; you can, in fact, make an argument in the comments :)
No ID3 Tags
Tags are a big part of music. For me, at any rate. Being able to find out album data, or the performers, or even the accurate rip rating of the stream that came off the original CD are things I like to be able to know. And, as my more recent post demonstrates, reading flac tags and reading to and writing to id3 (mp3) tags is all natively supported in php. So, I can just get the Flac tags, and embed them in the outgoing mp3 file.
Possible lack of resume support
I haven't yet tested whether this is actually a problem or not. But is still worth considering. When downloading a large file, connections may be lost, or some other issue may come up, and it is a waste of time and resources to download the file again. Now, there is a good chance this is going to be entirely handled by the webserver itself - it'll take all the content output and return the correct section. However, if this is not the case, the php could easily be made to handle this to. You first have to set the Accept-Range
response header to bytes
. Then, you check for the Range
(and If-Range
) request headers, which specify which range of bytes should be sent with the request - and the etag to match against. Then, you just skip through the file the correct number of bytes, and send the number asked for.
No cache clean up
Simples, this one. There is no mechanism for stale files to be removed from the local cache, steadily causing disc space to be consumed. OM NOM NOM disc space. This is most easily fixed with a cron job, or similar, that runs every now and gain and deletes files that fall before a certain time.
Dodgy file system security
There is the potential in the indexing system I used that a user could easily change the parameters to allow them to start reading arbitrary parts of the filesystem. Well, that is of course assuming that the www-data user can read them (not true, because I spend far too much time neatening permissions). It might, however, enable them to read my web server configuration files, and thus find a slightly more useful exploit to exploit. A chroot-like method would be the best defence here (the assumption of no symlinks out of the music folders is, for the most part, a good one...)
No playlist streaming
Ok, this is a bit out of the scope of the whole thing, but a radio like continuous stream would be a nice idea. I mean, it would nice to just be able to point a steaming radio player at it and have music for the rest of the day. A random folder selection, and then playing that folder in string comparison order. Because, jumping between random files would result in such huge genre shifts my mind might implode.
Of course, I haven't covered things like legalities, authentication, and the like.
- 1 ↑ Did You Know?: RFC stands for 'Request For Comments'