Big Data, Mining, and (Musical) Recommendation Engines

As a side project in my free time I’m helping a small business setup an e-commerce store front. One of the things we’ve discussed is the idea of a recommendation engine to suggest other items to purchase. This lead down an Internet rabbit hole where I ended up reading about The Echo Nest.
The Echo Nest is a self-described “music intelligence platform that synthesizes billions of data points and transforms it into musical understanding.“. It is widely herald as one of the largest and most comprehensive uses of data mining (to find the language and culture around music across the web) and big data (to store and present those relationships) within the music recommendation industry.
Yes! There is an industry. A substantial one. Apple’s Genius feature in iTunes, Pandora, Last.fm, Spotify – all are trying to provide relevant music based upon your listening tastes. Why? So you’ll buy more music of course!
Brian Whitman, one of the co-founders of The Echo Nest, talks in great length about the how and why behind what makes their product so unique – and so incredibly accurate. I won’t steal the thunder of the article, but needless to say, dedication and refinement are key.
This is totally sausage-making, behind-the-scenes stuff, but I encourage you to at least look it over.
Ok, so now the really fun stuff. Here’s something called The Infinite Jukebox. It uses some of the data points within the Echo Nest to create a version of a given song that never ends. It uses references within a song that are similar to other points within the song, makes some minor adjustments when needed (like tempo) and then plays the song forever. The presentation is neat as well, you can view the branches within the song where things loop and even click around the song to find points where things can loop.

At work we’re looking at ways of using the topics of big data, mining, and recommendation engines to provide better healthcare. Reading about The Echo Nest gives me some ideas on how these technologies could impact the care we give! If you have your own ideas or suggestions, please leave a note below.

Semantic MediaWiki Templates and #arraymaps are Awesome

Templates are awesome.

As I’ve written about before, we use Semantic MediaWiki extensively at work. One way we use it is to handle research requests for or Solution Architecture team. We have a customer-facing form1 for all requests and the resulting page is accessible for anyone across the organization – sharing our findings beyond the original requester.

For any requests submitted, the form creates a new wiki article as a sub page of “Research”. This is done by adding an attribute of “query string=super_page=Research” to the form.

It helps us to keep things organized by denoting which pages are specific to research vs. general wiki pages.

The problem is how semantic queries display pages that have a ‘super page’ prefix. By default the query results will show the super page as part of the formatting.

Demo of a default query with no template

 

See the “Research/” prefix on every item? That’s rather redundant (and ugly), so I sought out a way to remove the ‘Research/’ prefix when displaying the results, but still provide the correct link to the sub page.

The magic is two-part. First, you need to make sure your #ask query has the attribute of “link” set to none (link=none) and “format” set to template (format=template)2. This strips out any default formatting of the results. Here’s the #ask query we’re using. Note you’ll obviously want to change the variables to fit your properties.

{{#ask: [[Category:Research]]
|?Research_Level_Requested
|?Research_Submitted_Date
|limit=15
|link=none
|format=template
|template=Research Results Template
|order=DESC
|default=No related research found. Submit a [[Research]] request?
|searchlabel=”’15 Most Recent Research Requests Loaded. View all Research?”’
}}

Then, for your template you’ll use the #arraymap function to format the output.

{{#arraymap:{{{1}}}|Research/|noSuperName|[[Research/noSuperName|noSuperName]]}} – {{{2}}} – {{{3}}}

What this does is for each result it removes the “super-page” prefix (in this case Research/) from the first property returned – the page name.

{{#arraymap:{{{1}}}|Research/|

It then replaces it with the variable noSuperName. 

|Research/|noSuperName|

Finally we actually construct a normal internal wiki hyperlink by adding “Research/” and the variable together in the proper syntax.

[[Research/NoSuperName|NoSuperName]]

The remaining variables {{{2}}} and {{{3}}} are called as normal and a break tag is added to keep each query result on its own line.

The result is something like the following screenshot.

Custom template sans-super page prefix

You will then have nicely formatted results that are easier to digest.

I hope this helps those looking to extend the semantic queries and produce clean, repeatable results. Let me know in the comments if you have any questions or ideas of your own.

Use Semantic Mediawiki & Semantic Forms to Create a Folksonomy for Tagging Related Pages

At work we use Semantic Mediawiki to augment an internal wiki running on Mediawiki. It’s used to house anything from process documentation to troubleshooting guides for our IT department. We recently figured out how to use Semantic Forms and the #ask function to create a customizable and reusable folksonomy. Read on to find out more.

—-

One of the functions of my team is to fulfill research requests for co-workers within our IT department. These requests can be as simple as something like finding a white paper from a vendor or research organization, or as in-depth as custom analysis and reports of a given topic.

In order to handle these requests, we’ve created a submission and request fulfillment process using Semantic Forms.

Co-workers can fill out the form and we’ll use the resulting wiki page to fulfill the request.

One of the fields in the form that we use when fulfilling the request is an open text box for tagging related topic areas. Those fulfilling the research request can use a comma separated list of items to generate a folksonomy that can be used elsewhere on the wiki.

In the form we have the following. The property “Research Related Tags” is a property with the type of “Page”.

{{{field|RelatedTags|property= Research_Related_Tags |values_from_property= Research_Related_Tags}}}

 

Then for our template, we have the following.

{{Research Entry Template|RelatedTags=}}

The following is to query the semantic data and display it.

{{#arraymap:{{{RelatedTags|}}}|,|x|[[Research_Related_Tags::x]]}}

The #arraymap function pulls back the list of tags and displays them in the template.


For example, I might get a request for researching more about Hover Cars. Hover Cars might be related to other wiki pages such as our Transportation page or a page titled Automobiles. If I enter a comma separated list of related pages into the tag box when fulfilling a research request (such as ‘transportation’ or ‘automobiles’, links to the research documentation will automatically be created to any page that matches that name.

Now the cool part is that we have a lot of existing content elsewhere in the wiki and we could never predict what new content is going to be created.

What we’ve done, is to modify the default template for every wiki page to pull back any research documentation related to that page. If you were on our Automobiles page at the bottom would be a link to any research requests tagged Automobiles. Automagically!

Silly nonsensical test items all tagged with “Big Data”

To do this, we use the #ask parser function to query the “Research Related Tags” property, but only show research requests that match the current page name.

{{#ask: [[Category:Research]] [[Research_Related_Tags::~{{FULLPAGENAME}}]]
|? Research_Level_Requested = Research Type
| ?PAGENAME= Entry Title
|format=ol
|limit=10
|link=subject
|default=No related research found. Submit a [[Research]] request?
|searchlabel=More Research Information
}}

The secret sauce is in this opening line.

{{#ask: [[Category:Research]] [[Research_Related_Tags::~{{FULLPAGENAME}}]]

This starts the inline query, limited to the Category of Research that has a value for “Research Related Tags similar (~ is a semantic wildcard) to the current FULLPAGENAME.

The rest of the ask command is pretty standard semantic media wiki syntax. The one additional item to point out is the default= condition. As I mentioned earlier, this query is on every wiki page and some (a lot of) wiki pages won’t have related tags.

If no research exists users are given the suggestion of submitting a research request. When new pages are created and they match existing research (or vice versa) this part of the page will automatically update with related research.

 

I hope this provides inspiration into a new way of extending the use of semantic data in your Mediawiki environment. Leave a comment if you have any questions.

Management vs. Leadership

I was listening to The Critical Path podcast last week while at the gym and had a thought.
(at 22:00) Horace Dediu says the following,
“The dilemma is that management and leadership are one of these water and oil types of things. It turns out that distinction between management and leadership: – Management is keeping things running. Leadership is breaking things and not keeping things running the way they are. 
Leadership is about change, management is about the avoidance of change. Many times we embody with the word manager both of these things. so there’s an almost implicit recognition of the duality of management.”
I keep hearing folks talking about how frustrated they are when dealing with management in relationship to the work they do. Some managers get the bigger picture, others are totally heads down and don’t want to be bothered.
Are we expecting too much out of managers? People whose job is to keep the status quo vs. leaders – those whose job is to disrupt that work?
—-
I sent the above to a few co-workers I admire. One of them had the following to say,

“My adage on organizational hierarchies:

If you aren’t thinking past the work of “management” then you shouldn’t be promoted past the level of “manager.”

Have I over-simplified sufficiently? “
Spot on.

Email is Dead, Long Live Email

Note: I originally posted this to our internal discussion board at work looking for feedback. I wanted to share and archive it here as I think it’s a common problem for a lot of people and organizations.

—-

I’ve been thinking a lot about how we communicate as an organization and would like to publicly decree that I’m out to kill how we use email. No, I’m not going to go dismantle the Exchange server or anything like that, but rather I’d like to help figure out what went wrong when email was unleashed upon the world and take back control of our time and attention.

Like some villainous fungus sprung up in a night we have no natural defenses against it. We check it so frequently that it’s become an almost Pavlovian response to look at a screen when you hear a small chime. We use it to send out time sensitive and critical information, yet it was designed to be a passive asynchronous medium. We check it at work and at play – a times of the day where you can’t do anything about it other than fret. I’ve even been in meetings where people are checking email for other work and not paying attention to the work at hand!

What can we do to establish a more productive and sane email culture? I realize it’s not solely email’s fault for our difficulty in focusing on one thing at a time (and don’t get me started on the myth of multitasking), but we have to start somewhere.

So let’s start. Please join me and share you best advice for handling email. What tips and tricks do you use to help keep your head above the water and remain a well-functioning and communicative co-worker?

I’ll kick off with a few ideas of my own.

  • How do you create intelligent subject lines?
    Are you asking a question? Start the subject with “Question – blah blah blah”. Meeting invite? “Meeting Invite – blah blah blah”
  • Best practices for managing inbox cruft?
    You’ll never read those 113 newsletters you subscribed to about. Unsubscribe now, it’s ok. Use filters to automatically prioritize work. Email from a person or thing (like MTS Alerts) that are often high priority – filter them to an “Important” sub-folder. Med or low priority to a “Not So Important” sub-folder.
  • What’s the socially acceptable way to reply to a confirmation of something?
    Answer: single sentence email replies – bad. Elaborate or don’t bother to reply.
  • Did anyone ever attend “How to use Email and not go Totally Insane 101” when they first got email?
    No!? Why not? Should we host a Communities of Knowledge or workshop around email best practices?

Here’s some recent and related articles I’ve been reading that has me all riled up. I don’t have all the answers on how we can better leverage our time – and more importantly our attention, but here’s to a start.

And of course the always excellent Merlin Mann’s “Inbox Zero” (YouTube)