Creating a cache file in php for API requests

Web 2.0 was all about mashups and data api’s !, Some of the popular ones are google search api , twitter api etc.

All API’s have some rate-limiting , For example Twitter API has a rate limit of 150 requests per hour ! , Yahoo API and Google API’s have a daily limit on their requests.

The ultimate method of getting over this API request limit is to make a cache method which stores all data that is coming from API , Which you can refresh hourly, weekly or daily !

That’s why i created a small function to store the API data in a xml cache file , Which can be later on retrieved to get data.Find that function below, Most of the functionality is explained in the comments , If anything is not clear contact me via contact page.

< ?php function get_yahoo_data_cached($query, $zip) { // Rewrite by Julius Beckmann // Remove dangerous chars $query_safe = str_replace(array('.','/'), '_', $query); $zip_safe = str_replace(array('.','/'), '_', $zip); $cache_filename = "cache/$query_safe-$zip_safe.xml"; $time_expire = time() + 24*60*60; // Expire Time (1 Day) // Check file change time if(filectime($filename) <= $time_expire) { // File is too old, refresh cache $xml = get_yahoo_data($query, $zip); // Remove cache file on error to avoid writing wrong xml if($xml) file_put_contents($cache_filename, $xml); else unlink($cache_filename); }else{ // Fetch cache $xml = file_get_contents($cache_filename); } return $xml; } ?> 

Safe rewrite of the function by Julius Beckmann
This function creates a cache file for yahoo api requests , here we are using get_yahoo_data() function to fetch data from yahoo api which is a custom function for extracting data from yahoo api and throwing that in xml output format.

If we do a small synopsis of the above function we can see that it uses 2 variables to store cache , $filename stores the cache file and $file_log_name stores the last update time for that cache !

Technorati Tags: ,,

28 thoughts on “Creating a cache file in php for API requests”

  1. @jlintz
    Memcache is not installed on a lot of shared hostimg servers, And Php is used a lot on these shared hosting packages. Thus cache is a lot better way of making architecture independent code

  2. Surprisingly, yesterday I wrote something like that to cache results from: Twitter,Youtube and Flickr api in a database.

    — This is what I did —

    Before saving the results, I SERIALIZE the data, set an expire time and save it in the db.
    To retrieve it, the script then check the expired time, if time has not expired yet, it will just get the serialized data to UNSERIALIZE it and return it as it’s native state. Otherwise it will access the api again, serialize it, set new expired time and save it.

    I hope this help someone.


  3. @Mardix

    That’s a very valuable peace of advice , Surely it will help a lot of people.

    However ,Saving cache in database has some future problems , One of the main problem is that at a point sql queries hurt as these heavy sql queries make server very slow.

    Hope it helps

  4. A few comments to my rewrite.
    The old function used 2 files to save the data. 1 file that kept the data, the other file was just there to keep the last changed timestamp.
    This way it has several disadvantaged.
    – Using 2 files is slower and needs more space.
    – Using a timestamp file could be if problem if another program changed the cache file without changing timestamp file.
    – The timestamp of the last change action on a file is stored within the filsesystem itself. This value can be accessed very quickly and is also very reliable.

    There were also some security holes in this function. Reading and writing files should be controlled strictly. It is possible to read/write a file if you include ‘../’ to go to the upper directory and use a nullbyte at the end to select a other file then specified.

    Plus Bug was fixed. If the get_yahoo_data() could not fetch the data and would return not the current XML, this wrong XML would by cached.

  5. @inversechi

    Sounds great, It would also help if you can share the link of that plugin. I myself have moved to CI and would love to use that plugin.

  6. This helped me getting started with writing a simple caching plugin for API's to use within codeigniter – thank you 🙂

  7. This was originally design to create cache files out of API requests. Sometime this went out to be many MB’s. Not sure memcache is the right approach for that.

  8. What difficulties.

    Please note that i am not creating folders in this tutorial, that needs a separate function. I am just creating files in a folder.

  9. @Nomi Just replace file_get_contents() with the curl download function and this would work. Let me know if it helps.

  10. I don’t understand the if statement

    if(filectime($filename) <= $time_expire). This shouldn't be always true? You have setup $time_expire as rightnow+24H.

    Maybe this is more right?

    if( filemtime($filename) + 24*60*60 <= time())

  11. It’s:

    $time_expire = filemtime($cache_filename) + 60 * 10; // + 10 min
    if($time_expire < time()){
    // from api
    } else {
    // from file

  12. How can I avoid the rate limit of reddit?
    If I input wrong reddit account info , then I have to wait for about 2 seconds.
    and If twice, then 4$, if third, 10 min.
    As this , the rate limit increases geometrically.
    How can I do for disabling this function of reddit site?

  13. @Adam

    You’d need to use a proxy to bypass this limit. I think it would be IP based and can be passed.

Leave a Reply