dir_callback – Lets play with Directories (PHP)

PHP provides some handy functions for file and directory handling, but to deal with nested directories, it does not provide much. There are basic directory functions and a directory iterator, but does not cover directory level operations like moving, cloning, counting files, deleting etc.

Here we have a very simple directory function that takes a path and iterates through all its files and sub-directories (nodes) and passes on path of each node to the callback function.

function dir_callback( $dir , $callback , $args = array( ) )
{
    $dir    = realpath( $dir );
    if( $handle = opendir( $dir ) )
    {
        while( false !== ( $dir_item = readdir( $handle ) ) )
        {
            if( !( $dir_item === '..' || $dir_item === '.' ) )
            {
                $dir_item_path  = $dir . DIRECTORY_SEPARATOR . $dir_item;
               
                $callback( $dir_item_path , $args );
               
                if( is_dir( $dir_item_path ) )
                {
                    dir_callback( $dir_item_path , $callback , $args );
                }
            }
        }
        closedir( $handle );
    }
}

Using the above simple function, we can do much of what we want. This function does nothing, except to pass the full path of each node (directory or file) to the callback function. The callback function then decides on how to use that path. Notice that the function even does not return anything. It does not force you to use a fixed way to store data in recursive calls. You have to define in your callback your own way to store data. For simplicity, I am using $GLOBALS. Alternatively you may modify it to store data in a static variable, arrays, or write your own data storage/retrieval class.

I have written some callback functions below to be used in combination with dir_callback.

Calculating Directory Size (in Bytes)

function dir_size( $path )
{
    if( is_file( $path ) )
    {
        $GLOBALS['filesize']    += filesize( $path );
    }
}
 
$path     = realpath( 'E:\files' );
$filesize = 0;
dir_callback( $path , 'dir_size' );
 
//echo $filesize . ' Bytes';

List all Files and Sub-Directories

function dir_list( $path )
{
    $GLOBALS['files'][] = $path;
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_list' );
   
//print_r( $files );

Get Directory Structure

The above callback function returns all files and directories inside the parent directory. Here we have a function that returns an array of sub-directories only, constituting the whole directory structure.

function dir_structure( $path )
{
    if( is_dir( $path ) )
    {
        $GLOBALS['files'][] = $path;
    }
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_structure' );
 
//print_r( $files );

Styling Paths and File Names

This function will color the file name and the file path separately. Similar styling is used by many online storage services to display easy to read file names in deep nested file paths.

function dir_style( $path )
{
    $last_ds    = strrpos( $path , DIRECTORY_SEPARATOR );
    $GLOBALS['files']   .= ''.substr( $path , 0 , $last_ds ).''.
                            ''.substr( $path , $last_ds ).'
';
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_style' );
 
//print_r( $files );

List all files by Extension

function dir_list_by_ext( $path )
{
    if( is_file( $path ) )
    {
        $GLOBALS['files'][pathinfo( $path , PATHINFO_EXTENSION )][] = $path;
    }
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_list_by_ext' );
 
//print_r( $files );

Count Files and Directories

function dir_count( $path )
{
    if( is_file( $path ) )
    {
        $GLOBALS['files']['file'] = ( isset( $GLOBALS['files']['file'] ) ? $GLOBALS['files']['file'] + 1 : 1 );
    }
   
    if( is_dir( $path ) )
    {
        $GLOBALS['files']['dir'] = ( isset( $GLOBALS['files']['dir'] ) ? $GLOBALS['files']['dir'] + 1 : 1 );
    }
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_count' );
 
//print_r( $files );

Getting List of Unique File Names

This functions lists unique names of all files in the given directory.

function dir_filenames( $path )
{
    if( is_file( $path ) )
    {
        $filename   = basename( $path );
        $GLOBALS['files'][$filename]    = true;
    }
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_filenames' );
 
//print_r( $files );

Get Duplicate Files (Using File Names)

This function uses file names to identify duplicate at any level within a directory, and returns a list of such file paths.

function dir_get_duplicates_by_filename( $path )
{
    dir_callback( $path , function( $path ) {
        if( is_file( $path ) )
        {
            $filename   = basename( $path );
            $GLOBALS['files'][$filename][]  = $path;
        }
    } );
   
    $GLOBALS['files']   = array_map( function ( $e ) {
                    if( count( $e ) > 1 )
                    {
                        return $e;
                    }
                } , $GLOBALS['files'] );
    $GLOBALS['files']   = array_filter( $GLOBALS['files'] );
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_get_duplicates_by_filename( $path );
 
//print_r( $files );

Get Duplicate Files (Using MD5)

This function uses md5 hash of all files within a directory (at any level) to identify duplicates, and returns a list of such duplicates.

function dir_get_duplicates_by_md5( $path )
{
    dir_callback( $path , function( $path ) {
        if( is_file( $path ) )
        {
            $md5    = md5( file_get_contents( $path ) );
            $GLOBALS['files'][$md5][]   = $path;
        }
    } );
    
    $GLOBALS['files']   = array_map( function ( $e ) {
                    if( count( $e ) > 1 )
                    {
                        return $e;
                    }
                } , $GLOBALS['files'] );
    $GLOBALS['files']   = array_filter( $GLOBALS['files'] );
}

$path    = realpath( 'E:\files' );
$files   = array( );
dir_get_duplicates_by_md5( $path );

//print_r( $files );

Clone a Directory (Creating a Copy of Directory)

function dir_clone( $path , $args )
{
    $source_base_len    = strlen( $args['from'] );
   
    $dest_path  = substr_replace( $path , $args['to'] , 0 , $source_base_len );
   
    if( is_dir( $path ) )
    {
        mkdir( $dest_path , 0777 );
    }
    else if( is_file( $path ) )
    {
        copy( $path , $dest_path );
    }
}

$path   = realpath( 'E:\files' );
dir_callback( $path , 'dir_clone' , array( 'from' => $path , 'to' => 'E:\More_Files' ) );

Truncate a Directory but Preserve Directory Structure

function dir_empty( $path )
{
    if( is_file( $path ) )
    {
        unlink( $path );
    }
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_empty' );
 
//print_r( $files );

List all files by their Sizes

function dir_list_by_size( $path )
{
    if( is_file( $path ) )
    {
        $GLOBALS['files'][filesize( $path )][]  = $path;
    }
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_list_by_size' );
 
//print_r( $files );

List all Files by their Last Modified Time

function dir_list_by_lastmodified( $path )
{
    if( is_file( $path ) )
    {
        $GLOBALS['files'][filemtime( $path )][] = $path;
    }
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_list_by_lastmodified' );
 
//print_r( $files );

List all Files by their Last Access Time

function dir_list_by_lastaccess( $path )
{
    if( is_file( $path ) )
    {
        $GLOBALS['files'][fileatime( $path )][] = $path;
    }
}
 
$path    = realpath( 'E:\files' );
$files   = array( );
dir_callback( $path , 'dir_list_by_lastaccess' );
 
//print_r( $files );

All the above functions are useful for operations like Disk Cache Management, Uploads Management, User Account Deleting (Deleting all uploaded data of user), optimizing file system usage (by removing duplicates), summarized reporting direct from file system, etc.