Filesystem RSS
The file server in my flat, Eliza, got upgraded from 2TB to 4TB storage over the weekend. A most glorious undertaking, resulting in the power supply no longer actually fitting into the case and, instead, being left just beside it. Although, that's probably doing wonders for the air circulation.
The extra space allowed for me to catch up on adding the latest batch of CDs to the music collection. However, the collection is no so large it's getting difficult to actually spot what's new. So, at the suggestion of a friend (and various idea from them along the way), I created an RSS feed of the music being added
inotify
As it's all running under Linux, my first stop was inotify, the file system notification API. It has one very, very major flaw - it's non-recursive, so you have to watch ever folder in order to be notified. It does notify you of folders being created, but it's perfectly possible for the files to be written into them before you have time to create a watch. And the libraries I found, which were excellent otherwise, did not mark any files it found in the folder as just created. Now, I didn't have the will to start writing anything complicated for this.
Polling
Doing a direct poll of a folder structure that big is a waste of resources. However, I found a very efficient method of doing it - using the unix program 'find' to get a list of all the (relevant) files. If you have one list stored already, you can perform a diff to see what files have been added or removed, giving me all the information needed for generating a basic RSS feed.
The data was written to a MySQL database, mainly because I had one handy and it's easy. The stored data was a timestamp, UUID, the file, and the action which occurred (mapped onto the constant names from inotify, as I designed the schema when I still had high hopes for it :P
The script comes in two files - basically one for dealing with the file system and one for dealing with mysql. I could merge them, but I dislike the methods for calling shell commands in php, and it works well enough like this
#!/bin/sh
# Get list of all mp3 and flac files
find /srv/media/music/ -name \*.mp3 -or -name \*.flac > /home/benedict/scripts/data/newdiff
# Diff it with the stored list
diff --suppress-common-lines --speed-large-files /home/benedict/scripts/data/olddiff /home/benedict/scripts/data/newdiff > /home/benedict/scripts/data/diff
# Send this off to the php script
# if the php script fails for whatever reasons, this should auto exit, so the stored list isn't updated
/home/benedict/scripts/music_diff2 /home/benedict/scripts/data/diff
# Update the stored list with the new data, and removed temporary files
mv /home/benedict/scripts/data/newdiff /home/benedict/scripts/data/olddiff
rm /home/benedict/scripts/data/diff
#!/usr/bin/php
<?php
// Open the diff file
$f = @fopen($argv[1], 'rb');
if ($f === false) return;
// Connect to db
mysql_connect('localhost', [REDACTED], [REDACTED]);
mysql_select_db('musics');
while (! feof($f))
{
// Get a line, including any EOL charcters
$file = fgets($f);
// The first charcter of the line represents what is going on the line
// Added lines look like:
// > Content I added
// Removed lines look like:
// < content I Removed
// Everything else is meta-data or context
$mode = substr($file, 0, 1);
if (($mode != '<') && ($mode != '>')) continue;
// Substr call removes the '> ' or '< ', and the trailing \n
// Possible should be -strlen(PHP_EOL) as final argument
$file = mysql_real_escape_string(substr($file, 2, -1));
// For compatability with inotify flags
$action = ($mode == '>' ? 'IN_CREATE' : 'IN_DELETE');
// Modification time - either use files mtime, or now if the file
// does not exist. Convert to mysql format
$mtime = (file_exists($file) ? filemtime($file) : time());
$mtime = date ("Y-m-d H:i:s", $mtime);
// Add the change in file to the database
mysql_query("INSERT INTO writes(timestamp, guid, file, action) VALUES ('{$mtime}', UUID(), '{$file}', '{$action}')");
}
Generating the RSS
This was actually a surprisingly difficult task, and my code isn't really ready to be shown to the world. But, it's been far too long since I last posted, so I'm going to run with it.
We grab the last load of entries from the db, order by their timestamps backwards through time, Each RSS story is created with the next available file operation. Each file operation is then added to it until one is encountered with either a different directory or action. This is a rough and ready was of group the file operations, such that lots of files being added to a directory. Not that the loop will exit without producing the last story - this is deliberate, as we can't tell if the final story is actually finished, or if more records for it are available in the database. If we don't have all the records, we'll end up 'updating' the story with less information, to everyone's confusion1.
Depending on the number of entries in the story (1 or many) different story generation functions are called. Both of these use the oldest UUID as the UUID of the story (allows 'updates' to occur cleanly). If the files exist, we try to read their tags (as only mp3 and flac files are in use, this is surprisingly simple - PHP has native id3 support, and there's some really good flac libraries out there2) and use the album/artist and track/artist in place of just the directory and file names.
When I get round to polishing off my analysis of the music streaming service, including all the mistakes and bugs I conveniently ignored, I will be coming back and going other this block of code in detail. Until then, make of it what you will.
<?php
mysql_connect('localhost', 'root', 'root');
mysql_select_db('musics');
header('Content-type: application/rss+xml');
$entries = mysql_query('SELECT * FROM writes ORDER BY timestamp DESC LIMIT 3000');
$now = date('D, d M Y H:i:s O');
echo "<?xml version=\"1.0\"?>\n";
require('flacTags.php');
?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Eliza - New Music</title>
<link>http://eliza.harcourtprogramming.co.uk/~benedict/music/</link>
<description>RSS Feed of file creations and modifications in Eliza's music library</description>
<language>en-uk</language>
<pubDate><?php echo $now; ?></pubDate>
<lastBuildDate><?php echo $now; ?></lastBuildDate>
<managingEditor>ben.harcourt@harcourtprogramming.co.uk (Ben Harcourt)</managingEditor>
<webMaster>ben.harcourt@harcourtprogramming.co.uk (Ben Harcourt)</webMaster>
<atom:link href="http://eliza.harcourtprogramming.co.uk/~benedict/music/music.php" rel="self" type="application/rss+xml" />
<?php
$path;
$action;
$story;
$item = mysql_fetch_assoc($entries);
reset_story();
while ($item = mysql_fetch_assoc($entries))
{
if (pathinfo($item['file'], PATHINFO_DIRNAME) == $path && $item['action'] == $action)
{
$story[] = $item;
continue;
}
output_story();
reset_story();
}
function reset_story()
{
global $item, $story, $path, $action;
$story = array($item);
$path = pathinfo($story[0]['file'], PATHINFO_DIRNAME);
$action = $story[0]['action'];
}
function output_story()
{
global $story;
if (count($story)==1)
{
output_single_file($story[0]);
return;
}
output_composite_story();
}
function get_tags($file)
{
switch ( pathinfo($file, PATHINFO_EXTENSION) )
{
case 'mp3':
if (file_exists($file) && function_exists('id3_get_tag'))
{
$tag = id3_get_tag($file);
$tag['file'] = $file;
break;
}
case 'flac':
$tag = new flacTags($file);
if ($tag->readTags())
{
$tag = $tag->getAllComments();
$tag['file'] = $file;
break;
}
default:
$tag = array('file'=>$file);
}
return $tag;
}
function track_name($tag)
{
if (!isset($tag['title'])) return $tag['file'];
return $tag['discnumber'].'-'.$tag['tracknumber'].' '.$tag['title'].' - '.$tag['artist'];
}
function output_single_file($item)
{
switch ($item['action'])
{
case 'IN_CREATE':
$header = 'New: ';
$message = ' was added.';
break;
case 'IN_DELETE':
$header = 'Deleted: ';
$message = ' was deleted';
break;
default:
$header = 'Unknown action: ';
$message = ' was in some way altered';
}
$tag = get_tags($item['file']);
$name = track_name($tag);
?>
<item>
<title><?php echo $header . htmlentities($item['file']); ?></title>
<link>http://eliza.harcourtprogramming.co.uk/</link>
<description>
<p>The file <?php echo htmlentities($name) . $message; ?></p>
<ul>
<?php foreach ($tag as $k=>$v) echo '<li>'.htmlentities($k).': '.htmlentities(strlen($v) > 64 ? substr($v, 0, 64).'...' : $v)."</li>\n\t\t\t\t\t"; ?>
</ul>
</description>
<pubDate><?php echo date('D, d M Y H:i:s O', strtotime($item['timestamp'])); ?></pubDate>
<guid>http://eliza.harcourtprogramming.co.uk/~benedict/music/item.php?guid=<?php echo $item['guid']; ?></guid>
</item>
<?php
}
function output_composite_story()
{
global $story, $path, $action;
$guid = $story[count($story) - 1]['guid'];
$timestamp = $story[0]['timestamp'];
if ($action == 'IN_CREATE')
{
if (file_exists($story[0]['file']))
{
$tag = get_tags($story[0]['file']);
if (isset($tag['album']))
{
$title = 'Files added to ' . $tag['album'] . ' by ' . $tag['album artist'];
}
else
{
$title = 'Files added to ' . $path;
}
}
else
{
$title = 'Files added to ' . $path;
}
foreach ($story as $k=>$v)
{
$story[$k] = track_name(get_tags($v['file']));
}
}
else
{
$title = 'Files removed from ' . $path;
foreach ($story as $k=>$v)
{
$story[$k] = $v['file'];
}
}
?>
<item>
<title><?php echo $title; ?></title>
<link>http://eliza.harcourtprogramming.co.uk/</link>
<description>
<p><?php echo $title; ?></p>
<ul>
<?php foreach ($story as $v) echo '<li>'.htmlentities($v)."</li>\n\t\t\t\t\t"; ?>
</ul>
</description>
<pubDate><?php echo date('D, d M Y H:i:s O', strtotime($timestamp)); ?></pubDate>
<guid>http://eliza.harcourtprogramming.co.uk/~benedict/music/item.php?guid=<?php echo $guid; ?></guid>
</item>
<?php
} // End of output_composite_story()
?>
</channel>
</rss>