Checking for same sizes on images before loading?

I would like to confirm that the size of all image files in a folder is exactly the same *before I load the image files*. For example, I should throw a flag when I would select to open all image files in a folder with 10 images at 2MB each and one image at 1.5 MB. The intent is to prevent selecting a folder to load into a stack when the images in the folder are mis-matched in size.

Is there an efficient way to do this, even when it might involve breaking out to the shell or DOS level with ExecuteScript?

How about using Open and FStatus?  It was fast enough (<0.1 seconds) for ~80 files.

FUNCTION Get_File_Sizes(String strFolder_Path)
	
	//Get the path in case it's empty
	IF(strlen(strFolder_Path)==0)
	
		NewPath/Q/O/M="Select folder" pTemp_Path			//Gets the path to a FOLDER (not a file)
		PathInfo pTemp_path
			
		IF(strlen(S_Path)==0)
			Return -1
		ELSE
			strFolder_Path=S_Path
		ENDIF
		
		KillPath/Z pTemp_path	
	ENDIF
	
	Variable vStart=StartMSTimer
	
	//Get the files in the folder
	NewPath/O/Q/Z pFolder_Path, strFolder_Path
	
	String strFile_Names_List=IndexedFile(pFolder_Path, -1, "????")
	Variable vNum_Files=ItemsInList(strFile_Names_List)
	
	IF(vNum_Files>=2)
	
		Make/O/T/N=(vNum_Files) File_Names=StringFromList(p, strFile_Names_List)
		Sort/A File_Names, File_Names
		
		Make/O/L/U/N=(vNum_Files) File_Sizes=0
		
		//Get the file sizes
		Close/A 
		
		Variable iFileDex, vRefNum
		FOR(iFileDex=0;iFileDex<vNum_Files;iFileDex+=1)
			Open/R/Z/P=pFolder_Path vRefNum	as File_Names[iFileDex]
			
			IF(V_Flag==0)
				FStatus vRefNum
				File_Sizes[iFileDex]=V_logEOF
			ENDIF
			
			Close vRefNum
		ENDFOR
		
		Close/A
		
		Variable vStop=StopMSTimer(vStart)
		Print vStop/1e6
		
		//See if there are any files with a different size
		FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes
		
		IF(numpnts(Unique_File_Sizes)==1)
			Print "File sizes all match!"
		
		ELSE		//For each unique size, make a wave with that size and the name
			Variable iSizeDex
			FOR(iSizeDex=0;iSizeDex<numpnts(Unique_File_Sizes);iSizeDex+=1)
				Extract/O File_Sizes, $"Index_"+num2istr(iSizeDex)+"_Sizes", File_Sizes==Unique_File_Sizes[iSizeDex]		//Probably could just stuff the size into the wave note of the names wave
				Extract/O/T File_Names, $"Index_"+num2istr(iSizeDex)+"_Names", File_Sizes==Unique_File_Sizes[iSizeDex]
			ENDFOR
		ENDIF
		
	ELSE
		Print "There's only one file in the folder."
	ENDIF
	
	KillPath/Z pFolder_Path
		
END

Edit: Changed the file size wave from a double to a long unsigned integer.  I don't think the size will ever be negative or a non-integer.

Instead of Open you could also try:

GetFileFolderInfo/Q/Z/P=pFolder_Path File_Names[iFileDex]
File_Sizes[iFileDex]=V_logEOF

 

The help for GetFileFolderInfo says that V_logEOF is the number of bytes in the data fork, while V_logEOF from FStatus is the total number of bytes in the file, which had always made me think that those values would be different.  However, when I checked several files types (.h5, .png) the size was the same from both methods.  Maybe one of the Wavemetrics folks can chime in.

 

However, using GetFileFolderInfo in the loop is several times slower than using Open (~0.12 s versus ~0.04 s for 80 files).  

I also didn't understand that part, but it is probably fine for finding very different file sizes. But it makes sense that GetFileFolderInfo is slower, since it grabs more info. Better use Open then.

The help for GetFileFolderInfo says that V_logEOF is the number of bytes in the data fork, while V_logEOF from FStatus is the total number of bytes in the file

They are both the number of bytes in the data fork.

The FStatus documentation would be more precise if it said "The number of bytes in the opened fork" which is always the data fork.

The Open operation has always opened the data fork only.

Apple dropped support for resource forks a long time ago so the distinction between data fork and resource fork is moot at this point. 

In case anyone might need, here is a version that checks three things. In my applications, the num of files must be four or more, otherwise, the stack becomes an RGB image. I do not allow mixtures of file types to create a stack. Finally, I check for the same file size.


// input file name list (unparsed) in path imgPath
// return 1 if valid, 0 if invalid
Static Function f_IsValidateforStack(string fList, variable sizecheck)

	variable nf, nt, vRefNum, ic
	string tlist, plist, jlist
	string theFile, fName
	
	// check number of files (stacks must be 4+ images)
	nt = ItemsInList(flist)	
	if (nt < 4)
		return 0
	endif
	
	// check file names (no stacks from combinations of image types)
	tlist = ListMatch(fList,"*.tif")
	tlist += ListMatch(fList,"*.tiff")
	nf = ItemsInList(tlist,";")
	nt = nf != 0 ? 1 : 0
	
	plist = ListMatch(fList,"*.png")
	nf = ItemsInList(plist,";")
	nt = nf != 0 ? nt + 1 : nt
	
	jlist = ListMatch(fList,"*.jpg")
	jlist += ListMatch(fList,"*.jpeg")
	nf = ItemsInList(jlist,";")
	nt = nf != 0 ? nt + 1 : nt

	if (nt > 1)
		return 0
	endif
	
	// check file sizes (images must be same sizes)
	if (sizecheck)
		nt = ItemsInList(flist)	
		Make/D/N=(nt)/FREE File_Sizes = NaN
		for (ic=0;ic<nt;ic+=1)
			theFile = StringFromList(ic,fList)
			fName = ParseFilePath(0,theFile,":",1,0)
			Open/R/Z/P=imgPath vRefNum as fName
			if (v_flag==0)
			    FStatus vRefNum
			    File_Sizes[ic]=V_logEOF
			endif    
			Close vRefNum
		endfor		
		Close/A		
		FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes		
		if (numpnts(Unique_File_Sizes) != 1)
			return 0
		endif
	endif
	
	return 1
end

 

Jeff- I see you allow tiff, png and jpg. If no compression is applied, then the number of bytes in the file will be the same as the number of bytes in the ultimate image. But if any compression is done, then different images may wind up with different file sizes. Especially with jpg, the file size will depend on the quality setting and the amount of high-frequency features in the image. In tiff and png images, I would imagine that large patches of zeroes would compress almost to nothing.

Thanks for the heads up John. I've implemented a restriction to limit the creation of stacks to TIFF images only. As to the possibility of missing the true size differences for compressed TIFFs, only one case will fail in my revised approach. Failure will occur when the individual sizes of each one of a set of TIFF compressed files on the drive are **exactly** the same size but at least one (out of a minimum of four) loaded images is a different uncompressed size than all of the others. I'll take this as an edge case for someone with greater motivation to tackle.

// input file name list (unparsed) in path imgPath
// return 1 if valid, 0 if invalid
Static Function f_IsValidateforStack(string fList, variable sizecheck)

	variable nt, vRefNum, ic
	string tlist, theFile, fName
	
	// check number of files (stacks must be 4+ images)
	nt = ItemsInList(flist)	
	if (nt < 4)
		return 0
	endif
	
	// check file names (stacks only allowed from tiff)	
	tlist = ListMatch(fList,"*.png")
	tlist += ListMatch(fList,"*.jpg")
	tlist += ListMatch(fList,"*.jpeg")
	nt = ItemsInList(tlist,";")
	if (nt > 0)
		return 0
	endif
	
	// check file sizes (images must be same sizes)
	if (sizecheck)
		nt = ItemsInList(flist)	
		Make/D/N=(nt)/FREE File_Sizes = NaN
		for (ic=0;ic<nt;ic+=1)
			theFile = StringFromList(ic,fList)
			fName = ParseFilePath(0,theFile,":",1,0)
			Open/R/Z/P=imgPath vRefNum as fName
			if (v_flag==0)
			    FStatus vRefNum
			    File_Sizes[ic]=V_logEOF
			endif    
			Close vRefNum
		endfor		
		Close/A		
		FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes		
		if (numpnts(Unique_File_Sizes) != 1)
			return 0
		endif
	endif
	
	return 1
end

 

If I read this correctly, the only passing cases will be sets of TIFFs with zero compression, the false positive edge case that you mention, or a set of tiffs where the compression fortuitously gives the same file size. Why not check the file header (actually, the Image File Directory/Directories) for the image width(s) and height(s)? That's what you're really trying to check, right?

function GetHeightAndWidth(string strPath)
	
	variable refNum
	if (strlen(strPath))
		Open/R refNum as strPath
	else
		string fileFilters = "TIFF Files (*.tif,*.tiff:.tif,.tiff;)"
		Open/R/F=fileFilters refNum
		Print s_filename
	endif

	if (strlen(s_filename) == 0)
		return 0
	endif
	string strByteOrder = "00"
	int nextDirectory, theAnswer, byteOrder, numEntries, width, height
	int i, j
	int imax = 256 // maximum mumber of images to look for in one file
	int iTag, iType, iCount, iValue, iJunk
	
	FBinRead refNum, strByteOrder
	strswitch (strByteOrder)
		case "II" :
			byteOrder = 3
			break
		case "MM" :
			byteOrder = 2
			break
	endswitch

	FBinRead/U/F=2/B=(byteOrder) refNum, theAnswer // should be 42
	if (theAnswer != 42)
		Close refNum
		DoAlert 0, "could not read file"
		return 0
	endif
	
	// read the Image File Direcory/Directories
	for (i=0;i<imax;i++)
		FBinRead/U/F=3/B=(byteOrder) refNum, nextDirectory
		if (!nextDirectory)
			break
		endif
		FSetPos refNum, nextDirectory
		FBinRead/U/F=2/B=(byteOrder) refNum, numEntries
		width = 0
		height = 0
		// loop though Image File Directory
		for (j=0;j<numEntries;j++)
			FBinRead/U/F=2/B=(byteOrder) refNum, iTag
			FBinRead/U/F=2/B=(byteOrder) refNum, iType
			FBinRead/U/F=3/B=(byteOrder) refNum, iCount

			// for the values we're chasing, iType is 3 or 4 (two or four byte integer)
			// the value should always be found in the IFD, no need to interpret a pointer

			if (iType == 3)
				FBinRead/U/F=2/B=(byteOrder) refNum, iValue
				FBinRead/U/F=2/B=(byteOrder) refNum, iJunk
			else
				FBinRead/U/F=3/B=(byteOrder) refNum, iValue
			endif

			if (iTag == 256)
				width = iValue
			elseif (iTag == 257)
				height = iValue
			endif
		endfor
		Print "width", width, "height", height
	endfor
	
	Close refnum	
end

 

FBinRead/U/F=2/B=(byteOrder) refNum, theAnswer // should be 42
if (theAnswer != 42)
...

LOL, I wonder if this is some deliberate joke by the creators of TIFF.

Thanks Tony. I had thought that I eventually might do a read-only ImageLoad operation to capture the TAGs. Your approach may be less cumbersome. 

The TIFF format is described here

@chozo, the documentation describes it as "an arbitrary but carefully chosen number"

@jjweimer, i edited the snippet to properly handle the case where height and width are encoded as 2 byte integers. I doubt that there are actually any files where this is the case.