List Files With Glob Patterns
Navigating the intricate landscape of your digital files can sometimes feel like searching for a needle in a haystack. Whether you're a developer organizing project files, a researcher gathering specific documents, or just someone trying to declutter, the need to efficiently locate files based on patterns is paramount. This is where a robust directory scanner becomes an indispensable tool. We're diving deep into the creation of a primitive, yet powerful, directory scanner designed to recursively sift through your directories, pinpointing files that match specific glob patterns. Imagine needing to find every SKILL.md file scattered across your ~/.claude/skills/ directory; this is precisely the kind of task our scanner is built to handle with elegance and speed.
The Power of Glob Patterns: More Than Just Wildcards
At its core, our directory scanner leverages the expressive power of glob patterns. You might be familiar with basic wildcards like * (matching any sequence of characters) or ? (matching any single character). However, glob patterns offer a much richer syntax for defining file matching criteria. The ** pattern, for instance, is a game-changer for recursive searching, allowing you to match zero or more directories and subdirectories. This means you can effortlessly search not just in the current directory, but in all directories nested within it, no matter how deep the hierarchy goes. Furthermore, glob patterns support extended matching with constructs like {a,b} (matching either 'a' or 'b') and [a-z] (matching any single character within the range 'a' to 'z'). Our scanner is engineered to embrace these capabilities, ensuring you have fine-grained control over your file discovery process. The goal is to provide a simple, yet comprehensive, interface: scan(baseDir: string, pattern: string) → string[]. This function, when called with a starting directory (baseDir) and a glob pattern, will return an array of all matching file paths, conveniently sorted alphabetically. This ensures that the output is not only accurate but also predictable and easy to work with.
Building the Primitive Directory Scanner: Functionality and Features
Our mission is to build a primitive directory scanner that is both functional and reliable. The primary function, scan(baseDir, pattern), is designed to be straightforward. Given a baseDir and a pattern, it should return an array of strings, where each string is the absolute path to a file that matches the provided glob pattern. A key requirement is recursive scanning: when the pattern **/*.md is used, the scanner must be able to find all .md files, irrespective of their depth within the baseDir. This recursive capability is crucial for managing large and complex project structures. We also prioritize robustness; if a non-existent directory is provided as baseDir, the scanner should gracefully return an empty array without throwing an error. This prevents unexpected crashes and makes the tool more forgiving in dynamic environments. Furthermore, the acceptance criteria stipulate that the returned array of file paths must be sorted alphabetically. This seemingly small detail significantly enhances usability, providing a consistent and organized output that developers can rely on.
To ensure the directory scanner is as effective as possible, we're considering leveraging existing libraries. The glob package is a popular choice for handling glob patterns in Node.js, offering a mature and feature-rich solution. Alternatively, if we are working within the Bun runtime, its built-in Bun.glob() function presents a potentially more performant and integrated option. The choice will depend on the specific environment and performance requirements. Regardless of the implementation detail, the interface remains the same, abstracting away the underlying mechanics. We are committed to achieving high test coverage, aiming for over 80% with at least 7 unit tests. These tests will rigorously verify the scanner's behavior across various scenarios: basic glob matching, recursive searching with **, handling of non-existent directories, ensuring alphabetical sorting, and confirming the correct interpretation of all supported glob pattern syntaxes (*, **, {a,b}, [a-z]). This dedication to testing ensures the reliability and correctness of our primitive directory scanner.
Ensuring Reliability: Error Handling and Test Coverage
When developing any file system utility, error handling is not just a best practice; it's a necessity. Our directory scanner must be designed with resilience in mind. One critical aspect is gracefully handling permission errors. If the scanner encounters a directory it doesn't have permission to read, it should not halt execution or crash the application. Instead, it should ideally log the issue or simply skip that directory, continuing its scan of accessible locations. This ensures that the scanner can still provide partial results even in restrictive environments. The acceptance criteria explicitly mention returning an empty array for a non-existent directory, which is a form of error handling. However, the scope extends to other potential issues like file system inconsistencies or read errors. The goal is to make the scanner predictable and non-disruptive.
To validate these robust behaviors, comprehensive testing is key. We've outlined specific test cases that must be covered: basic glob patterns, recursive glob patterns (especially **), handling of non-existent directories, and the crucial alphabetical sorting of results. Crucially, we need to ensure that all standard glob pattern features are correctly interpreted. This includes: the wildcard *, the recursive wildcard **, the alternation {a,b}, and character set matching [a-z]. Achieving over 80% test coverage with at least 7 unit tests is our target. This level of testing provides a strong assurance that the directory scanner will perform as expected under various conditions. Each test case will focus on a specific aspect of the functionality, building confidence in the overall implementation. The implementation will reside in src/primitives/directory-scanner.ts, making it a self-contained and easily manageable primitive. Upon completion, the code will undergo a thorough code review and approval process, followed by the completion of all necessary documentation, fulfilling our definition of done and ensuring the directory scanner is production-ready.
In conclusion, the development of this primitive directory scanner is a vital step in building a more capable personal AI infrastructure. By providing a reliable and efficient way to discover files using powerful glob patterns, we empower users to better organize and interact with their data. Whether you're searching for specific configuration files, organizing project assets, or managing a vast collection of documents, this tool offers a simple yet effective solution. The focus on robust error handling, comprehensive testing, and a clear, intuitive interface ensures that this scanner will be a valuable addition to any developer's toolkit. For further exploration into file system operations and globbing in Node.js, you might find the Node.js documentation on path modules and file system operations incredibly insightful. Additionally, resources like the npm page for the glob package offer detailed examples and advanced usage patterns for mastering file matching.