Friday, April 8, 2016

Honus: easy access to MLB highlights part 1

For this project I will gather data from MLB's online repository. The main goal is to create an interface for the highlight videos they provide online since they are very hard to access otherwise.

I have started by creating a GitHub account, which can be found here.  I'm using Linux and already have Git installed, so I'll follow these instructions to link my account.

I created a repo online that I'll clone to my computer using the "git clone <repo url>" command following these instructions.

First I'll try  to make a small change and push it back to GitHub. I added a sentence to the README.md file. I had some trouble finding the right commands to push the changes back to GitHub, but eventually figured it out.

I first committed the changes using "git commit -a", which is the interactive commit, meaning it opens a text file and the update message is entered there. The "-a" means that everything that has been added is updated, not just what specifically told. Then to push the changes I used "git push origin master", and after I entered my username and password the changes were accepted and appear on GitHub.

I'm going to try to use Google App Engine for hosting. I installed the Google App Engine SDK following the instructions, which included downloading PHP and MySQL.

I had a really difficult time getting even the sample Google provides to run. I kept getting an error that said "No URLMap entries found in application configuration", but it was probably my fault. The command that finally worked was
~/google_appengine/dev_appserver.py  --php_executable_path=/home/<USERNAME>/php-5.4.25/installdir/bin/php-cgi ~/Downloads/appengine-php-guestbook-phase0-helloworld/

One of the tough parts also was getting the right path for PHP since it wasn't where I expected it to be installed.

Next I'll begin building the page. Unfortunately my HTML/CSS, JavaScript, and PHP are all very rusty.

MLB stores a lot of their basic content on their online gd2 database, found here. What I'm looking for is the highlights, which are found in the media folder after selecting a day and game. I'll start with the Dodgers-Padres game from 6/14/2015 so we have Joc Pederson's spectacular game-saving grab.

I'll begin by copying the new_project_template folder in the GAE php folder into my repository folder. The first order of business is to try to get it to run, adjusting the command above. After renaming the template folder to "gae", and changing the directory to inside of the repo, it successfully ran with the command
~/google_appengine/dev_appserver.py  --php_executable_path=/home/<USERNAME>/php-5.4.25/installdir/bin/php-cgi ./gae 
 To get the page contents from the MLB site I've used the command file_get_contents. So far my PHP file just has
<?php
$filename = "http://gd2.mlb.com/components/game/mlb/year_2015/month_06/day_14/gid_2015_06_14_lanmlb_sdnmlb_1/media/highlights.xml";
echo $filename;
$homepage = file_get_contents($filename);
echo $homepage;
?>

It prints a mess to the page that loads in the localhost as expected. Now we need to find a way to parse the highlights.xml file to get the headlines and their associated videos.

To do this I used the function simple_xml_string as described on PHP.net. Now I can loop through all the media tags and grab all the headlines or links I want. By adding the following code I was able to print out each of the headlines and blurbs (I'm not sure which one I want to use yet, and the variable name $character is used below because I copied it from the example)
$xml = simplexml_load_string($homepage);
foreach ($xml as $character) {
   echo "<p>",$character->headline, ' : ', $character->blurb, PHP_EOL,"</p>";
}
Something that may be useful is that I can get the number of media tags using count($xml), which is 25 for this page. Also if I want to access the ith media tag I can do this using "$xml->media[i]".

Now after getting this far I should make a commit to GitHub so I don't lose my work. The folder gae is new, so I'll add it with "git add gae/", and then commit with "git commit -a -m "Added simplexml_load_string"", and finally sync with GitHub using "git push origin master". The update shows up on GitHub, so it all looks good so far.

We're going to want to have a list of video titles, then only one in the player. To do this we'll need to create arrays, for each of the headlines/blurbs and video URLs. We accomplish this by creating an empty array, then looping through each of the media tags and using array_push to add the headline/blurb/link to the array. The following code accomplishes this for the headlines
$headlines = array();
foreach ($xml as $media) {
array_push($headlines,$media->headline);
}
I did a similar thing for the blurbs and URLs. One problem that I see with the URLs is that all but one are mp4's, the other being an rtmp streaming link with the full highlights (20 minutes) of the game. It looks like, asides from examining the URL, I can check the media tag for the attribute "media-type". The values for this attribute that I see are: C for condensed game, R for recap, T for the general highlight (which most are), and the last I see is an M which appears to be longer than most, but I'm not sure what it means. Only the condensed game (C) is the rtmp link. I'll ignore this problem for now and try to see if I can get a printed list of headlines which will play the related video when clicked on.

To get a video to play when the headline is clicked will require the use of JavaScript and creating a function to call when there is a click (onclick). This should open the video player with the appropriate link. So first I'll create a video player using HTML, given by the following code
<video id="videoplayer" controls>
<source src="<Video URL here.mp4>" type="video/mp4">
Your browser does not support the video tag.
</video>
 Now we should create a table. On the left side we'll have the list of headlines, the right side will be the video player. This gets really messy since I used a table inside the table for the headlines. There's probably a better way to do this, but this works fine for now. Here's the code that I added to the end of the PHP file.
<table>
<tr>
<td><table>
<?php
foreach ($headlines as $headline) {
echo "<tr><td>",$headline,"</td></tr>";
}
?>
</table></td>
<td>
<video id="videoplayer" controls>
<source src="http://mediadownloads.mlb.com/mlbam/2015/06/15/mlbtv_lansdn_165780483_1200K.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</td>
</tr>
</table>
And here's a screenshot of what shows up so far. The video is just the first link that I pasted there, it's not interactive yet.

To get the video to change when clicking the headline I'll need to set the onclick attribute for each of the headline td's to change the URL attribute of the videoplayer, which can be identified using its id. I managed to get this to work by using the following code in place of the foreach section above. I had to change to a for loop so that I could have the index to access both the $headlines and $urls arrays.
for ($iii=0; $iii<count($headlines); $iii++) {
$headline = $headlines[$iii];
$url = $urls[$iii];
echo "<tr><td onclick='document.getElementById(\"videoplayer\").setAttribute(\"src\", \"",$url,"\");'>",$headline,"</td></tr>";
}
Now we have interactivity in getting the video we want to play. We'll still need to add the ability to pick the day and game, better formatting, fix the non-mp4 links (the rtmp link does nothing when clicked on), and add some sort of home page, but this is really good so far. So I'll commit it again to GitHub.



Now I'll add variables for the date ($year, $month, $day) and team ($team for the team you are looking for). The date variables chosen for the current day can be chosen with
$year = date('Y');
$month = date('m');
$day = date('d');

We may later want to have the default day be yesterday, especially early in the day, but I'll leave that for later.

First we'll make it so that the games from the day are all shown, maybe just along the top or side. To do this we'll use the master_scoreboard.xml file found for any given date (here's the one for the day I've been using so far). After using file_get_contents again to get the page, we will have to access the attributes of the XML tags instead of the content inside the tags as before. The examples here show how this can be done. I want to go through each game and print out the team names, score, inning or start time, and maybe create a link.

I'll use a table, then print the team names, the score and inning if they have started or else the start time. I went a bit overboard with the tables, and will need to clean it up later, but the following code gave me a working score table
<?php
// set date today
$year = date('Y');
$month = date('m');
$day = date('d');

// get scoreboard as XML
$dateurl = "http://gd2.mlb.com/components/game/mlb/year_".$year."/month_".$month."/day_".$day."/";
$datepage = file_get_contents($dateurl);
$datescoreboardurl = $dateurl."master_scoreboard.xml";
$datescoreboardpage = file_get_contents($datescoreboardurl);
$datescoreboardpagexml = simplexml_load_string($datescoreboardpage);


// create table of scores, each row is a game
echo "<table>";
foreach($datescoreboardpagexml as $a) {
echo "<tr><td>";
echo "<table><tr><td>",$a->attributes()->away_team_name,"</td></tr><tr><td>",$a->attributes()->home_team_name,"</td></tr></table>";
if (($a->status->attributes()->status)=="Final") {
echo "</td><td>";
echo "<table><tr><td>",$a->linescore->r->attributes()->away,"</td></tr><tr><td>",$a->linescore->r->attributes()->home,"</td></tr></table>";
echo "</td><td>";
echo "F";
} elseif (($a->status->attributes()->status)=="In Progress" || ($a->status->attributes()->status)=="Review") {
echo "</td><td>";
echo "<table><tr><td>",$a->linescore->r->attributes()->away,"</td></tr><tr><td>",$a->linescore->r->attributes()->home,"</td></tr></table>";
echo "</td><td>";
if (($a->status->attributes()->top_inning)=="Y"){echo "&#8593;";} else {echo "&#8595;";}
echo $a->status->attributes()->inning;
} elseif (($a->status->attributes()->status)=="Preview") {
echo "</td><td>";
echo "<table><tr><td>",$a->away_probable_pitcher->attributes()->last_name,"</td></tr><tr><td>",$a->home_probable_pitcher->attributes()->last_name,"</td></tr></table>";
echo "</td><td>";
echo $a->attributes()->time," ET";
} else {echo "Game status unknown"}
echo "</td></tr>";
}
echo "</table>";
?>
Here's a screenshot of what I have so far. Clearly I'll need to reformat the scores so they fit on the left side. Other changes on the way will be to select the date and team, better HTML/CSS, and more.
Since we've made good progress and I'm taking a break now, I save it again to GitHub using "git commit -a -m "Added scoreboard"" and "git push origin master".





(I'm writing this over the course of a few months.) My laptop I had been using is giving me problems, so I'm switching to my other laptop, which means going from Linux to Windows. To make this switch I downloaded the GitHub application for Windows and the Google App Engine API. These are actually easier to use since it is Windows instead of Linux terminal.

I added a title by simply adding the following to the top
<h1 align='center'>Honus<h1>
I'm trying to make it so that the highlights are from a game from the current day. It should just start as the first one with the option to select other games and eventually to select other days. I'm realizing that I'll have an issue with $year, $month, and $day: do they refer to the current date or the date of the selected game? I'm going to change it to be the selected date. I'll change the variables to $yeartoday, $monthtoday, and $daytoday to try to reduce confusion. But the selected day will default to today, so I won't change much.

The hard part is to get the URL for the selected game. Selecting the first game by default, I was able to get the highlights for today's first game to show up using the following code. I have an error now since the game hasn't started, so there are no highlights to load, I'll have to fix this later.
$away_code = $datescoreboardpagexml -> game[0] -> attributes() -> away_code;
$home_code = $datescoreboardpagexml -> game[0] -> attributes() -> home_code;
$filename = "http://gd2.mlb.com/components/game/mlb/year_{$year}/month_{$month}/day_{$day}/gid_{$year}_{$month}_{$day}_{$away_code}mlb_{$home_code}mlb_1/media/highlights.xml";
I fixed by finding that $homepage has a certain string value when the page doesn't exist. So the code that gets the highlights only runs when $homepage is not the 404 value. Not the best method, but I think it'll work.
if ( substr($homepage,0,23) == "GameDay - 404 Not Found") {
echo "\nNo highlights yet";
} else {
$xml = simplexml_load_string($homepage);
echo $xml;
foreach ($xml as $media) {
array_push($headlines,$media->headline);
array_push($blurbs,$media->blurb);
array_push($urls,$media->url);
}
}

To get the highlights from another game, we need a way to have PHP get the highlights again for the specified page. I can do this by creating a function that takes in the game number, then does the rest. I just need to put the code I have into a function with proper changes. Actually this is really hard. For each game in the sidebar, I'll had to add an onclick option. Onclick it will call a JS function with the input being the game number (some sort of ID). The function will actually have to reload (or go to new page or some sort) so that the PHP can be run again. So I also need to pass information to PHP through the request. This StackOverflow question looks helpful. So when the page loads, each game will have an onclick function that goes to a URL specified by PHP using this method. But I'll also have to add a way to get the info out of the URL. The only info that has to be passed through the URL is the date and game number. It would be nice to just pass one of the team's abbreviations since it would be more meaningful, but that seems harder at the time.

Nevermind, it'll actually be easier to pass a team name, but harder to return it to a game, so I'll go with it. Here's the function I created. I decided the entire date as  YYYYMMDD to make it shorter, and it isn't too hard since they are all four/two strings already (there would be trouble if the day were 9 vs 10, but 09 vs 10) is easy to handle. I'll just have to decipher it on the other side.

function createGameOnclickURLForJS ( $gototeam , $gotoyear ,          $gotomonth , $gotoday) {
$gotodata = array (
'team' => $gototeam,
'date' => $gotoyear . $gotomonth . $gotoday
);
return "window.location.href = '/?=" . http_build_query($gotodata) . "'";
}
I added these to each game box by adding the following code (tr was already there, I just added the onclick). The away_code is an object, but I found that by subsetting it from 0 to 3 I got the teams 3 letter codes. I hope they are all three letters, none two letters, otherwise this won't work (I'm pretty sure they are, although I'm still confused between the abbrev and code choices).
echo '<tr onclick="' . createGameOnclickURLForJS(  substr($a->attributes()-> away_code   , 0, 3   )     ,2013,08,12) .'"><td>';
So now when I click on the small box with the game score it goes to the URL as I'd hoped. Now I need to find a way to read this data in when the page loads. So first PHP will see if there is a specified game to go to. If not it will go to a default (probably the current day and my favorite team).




I set it to use the date given through the URL if provided, otherwise use the current date. One problem was that the parse_date function read the month "08" as integer so it was cut to "8". I check the length of it with strlen, if the length is 1 then I add a 0 in front, easy fix.


Now we'll add a way to change the date. I'm trying to add the datepicker tool for jQuery. First I'll create a script.js file for all my JavaScript functions. I also added a stylesheet.css since I was already in there. After this I had a lot of trouble getting the script.js to load. I had to add the paths to the app.yaml file, but then I had a stupid problem where I had "handlers:" before each of the handlers. When I deleted all but the first, it worked fine.

I need to find a way to get the datepicker to submit its date to a URL redirect so I can load that day. I added the datepicker to the html code using the following.

<table><tr>
<td>Date:</td> 
<td><input type="text" id="datepicker"> </td>
<td><button onclick='goToDatePicked()'> Go! </button></td>
</tr></table>
The following shows the onclick button. It adds the date to the window redirect URL. I should add the choice of team here as well, but that will require another input box.
function goToDatePicked() {
 var day = $( "#datepicker" ).datepicker( "getDate" ).getDate()  ;
 day = day.toString();
 if (day.length == 1) { day = "0" + day ;}
 var month =   $( "#datepicker" ).datepicker( "getDate" ).getMonth() + 1  ;
 month = month.toString();
 if (month.length == 1) { month = "0" + month ;}
 var year =   $( "#datepicker" ).datepicker( "getDate" ).getFullYear()   ;
 year = year.toString();
 console.log(day,month,year);
 window.location.href = '/?date=' + year + month + day ;
}

===============

The above was completed last summer and is available online at http://honus-1064.appspot.com/ with the Github repo at https://github.com/CoolCoolCoolCool/Honus. As of today (4/8/16) it seems that highlights do not appear for this 2016 season that just started. I am going to try to tackle this problem now.



No comments:

Post a Comment