Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

Hard Drive File Search Tools 3

Status
Not open for further replies.

Sparweb

Aerospace
May 21, 2003
5,103
0
36
CA
At work we have a library of reference data, which all of our 30+ engineers can access and refer to as they work.
It contains design manuals, regulatory guidance, vendor datasheets, industry standards, etc. The library has the following approximate dimensions:
120 GB, 170,000 files, 7200 folders
It grows significantly every year.
I have found that newer members of our engineering staff, not familiar with our knowledge base, have trouble using it.
For our younger engineers, it is a matter of believing the information exists and/or that we could possible know something that Google doesn't.
For our older engineers, like me, when things are stored on a path I wouldn't expect to follow, I have trouble finding other people's additions.

There was a time that Google Toolbar could be used to search and efficiently find information in the library, but GT has gone by the wayside.
To find things now, we rely on a rough system of organization... and the horrible windows search bar above the folder viewer.

I've been looking at file search tools and found some candidates - but I thought I'd ask if Eng-Tips members have any recommendations before I commit?
"Copernic" and "UltraFileSearch" are the two contenders I like.


STF
 
Replies continue below

Recommended for you

I recommend a Wiki. It allows additional metadata, allows commenting on the value of the files, access control if that's of any importance, creating of indicies independent of path structures. The most important factor is that it allows users to link directly from one document to another so that intra-document traceablity can be done; this also allows spotting documents that are missing by the lack of an active link. For example, in my personal Wiki, I have used it to keep project documents; there have been many times when I finally have a need to refer to a previously skipped document and when I added it I found that it was referred to by dozens of others that had speculative links in them.

I find it is valuable to extract file textual contents for inclusion in pages, saving the time required to fire up some application, though the page still retains a link to the original document - this is where the embedded links are placed. I have had no problem loading the equivalent of several hundred MIL standard pages as a single Wiki page; I have found that for some it still makes sense to break them out by chapter.

Another wonderful feature is that a Wiki keeps history, eliminating having to create new file names for every version. The individual pages also get not only histories, but can also have difference reports to see what changed between any pair of versions (another reason for text extraction.)
 
I have used Copernic in the past, but now I use Windows search and find it excellent. As long as it is set up to index all the folders with accessible information, and to index pdf and anything else containing readable text, it is fast and finds all relevant files. I particularly like that you can set it up to show a readable preview of each file, which makes it much easier to skip over irrelevant hits.

Why do you find it horrible?

Doug Jenkins
Interactive Design Services
 
Wow, thanks to everyone!
IDS, thank you for the tip on the PDF searches. About 95% of the files in my library are PDF's.

3DDave,
I'm a bit confused by your suggestion - I am probably too ignorant to understand it. My mental image is that I would view a HTML page with a web browser, using local file addresses rather than an internet address, reading a bunch of subject-specific pages that I and my colleagues have spent time collectively writing and editing over a period of months or years. Like Wikipedia for our network servers. If you could convince me that this would be a productive pursuit then I'd be happy to start now. Unfortunately, I lack the charisma to convince anyone else in the department, so the effort to write all this would be entirely mine. If I have completely misunderstood, I hope you can forgive me.

Dik,
DOS list is pretty much what I do now!

STF
 
My favorite Win search tool is Agent Ransack, which is fast, and can find stuff by filename or by contents, including inside pdfs and a host of other formats.

It is free from mythicsoft.com, for personal and commercial use.

It has a big brother paid app, FileLocator Pro, which is claimed to do even more, so it must be incredible.

My wife is a total computer Luddite, who got herself a job as a secretary, of all things.
I was able to walk her through:
Finding mythicsoft.com,
Downloading Agent Ransack,
installing it,
and using it to find a bunch of files that were mis-filed by someone else,
just talking to her over the phone.
I can think of no higher accolade.





Mike Halloran
Pembroke Pines, FL, USA
 
...but doesn't fix the other problems with windows search bar, of course... [sad]
This is still an improvement, but there are still some quirks.
Text within a PDF can be found only if I have opened the file. If the file has not been opened, then the text string is not found.
For example, the word "toughness" appears in most of the 40 files I have in a particular material properties folder.
I know this because all of the files were generated by the same source (Boeing) and follow the same format, just for different materials.
I have opened 4 of the files to confirm that my memory is not faulty.
When I do a windows search in this folder, the only files that appear in the result are the 4 files I have already opened.

Checking with task manager/resource manager, I briefly saw the indexing service looking through the files as I accessed them, but the rest are ignored. In the time it's taken to write this up, the indexing service has stopped indexing again. So... the only way to index all of the PDF's is to open every. single. one.

I'm considering the need to "Rebuild Index", which is a button in the control panel menu that I previously ignored. Just have to give it the time it needs to run the whole index all over again.

STF
 
Wikis have web servers, not local addresses. They also have file storage, which they can index, and they have pages, history, links to external web pages on other sites, and many other features you may find useful or may ignore. No one needs to spend months or years. Where did that idea come from? You already spent that amount of time on your current system and you don't like it.

Wikipedia is a particular wiki. Most of them are in the CMS, Content Management System area. They manage content. You have content that you need to be managed so that users can find what they are looking for. My guess is that it's less than 1% of what you have now with no traceability to any projects you use it for.

If you create any documentation then that usually has some basis. So a bracket might have a material spec, a finish spec, a next assembly, a requirements document, a stress report, some meeting notes relating to the bracket, an outside supplier, and so forth.

Here's the typing that might be required for part XYZ:

((SAE AMS QQ-A-250))/11
((finish spec))
((next assy))
((XYZ Requirements))
((XYZ Stress Reports))
((XYZ Meeting Notes))
((link to PDF file of drawing))
Supplier: ((CAGE Code))
Responsible Engineer: ((Bob the Builder))
Contract: ((123xyz))

Everything in (( and )) is a link to a page. If the page doesn't exist, when it is selected it asks if you want to create the page. So it's easy enough to create scaffolding; unlike some systems which won't let connectors be created without a place to connect to.

Or, however you care to look at the data. Maybe you have an ERP system that does some of that. So skip it; this is an example of what it could be used for, not what you have to use it for.

The one I use is TikiWiki. There are others.

Perhaps, as Henry Ford suggested, you are just looking for a faster horse.
 
Sparweb... my library started about 25 years ago... I wasn't aware of Wiki's... going to look into this. Maybe too many files...

Dik
 
has Everything, which is a faster indexer of filenames than the default Windows indexer and can be configured to boot with Windows. If you have descriptive filenames, that's an option.

OK, but it's indexing of the content of pdfs and Office files that I find really useful, and once it is set up properly, Windows does an excellent job of that.

Doug Jenkins
Interactive Design Services
 
So I let Windows rebuild the index overnight... and it seems to have built an even more incomplete index. Resource monitor displays virtually no activity by the indexing services.
Forget it!

3DDave, I was thinking of an Access database, so thank you for giving more detail on the wiki. Now I'm picturing a project-specific or part-specific page of designer's notes, which cross-reference the source data (wherever it's stored the link can be made) that the designer used. OK now I think I have a more complete picture. I write notes like that all the time, often in Notepad, and including knowledge base links as I use them. No it's not clickable links, but cut-and-paste is so easy for a keyboard-oriented user like me. The wiki idea seems better as a tool to cultivate in all of my colleagues, the habits to make notes during their design process, and to do it in a way that others can benefit from... yes the philosophy of wiki's seems to be sinking in. [smile]



STF
 
I have been using Copernic (paid version) for over twenty years.[ ] I may be suffering from a failure of imagination, but I cannot think of any area in which it is deficient:[ ] it indexes "on the fly";[ ] it finds things blindingly fast;[ ] it has a good repertoire of advanced searching features.

I am in the process of upgrading to version 7,[ ] "In the process" because I have struck a few difficulties in getting it to activate its "extensions".[ ] My current thinking is that this is because my Windows-10 computer somehow acquired an invalid name when it was set up, and Copernic's licensing / security (extensively revised for version 7) makes use of a computer's name.[ ] I am getting good help from Copernic Support, but our diametrically opposed time zones mean we can only achieve one question / answer per 24 hours.

One thing I have already noticed about Copernic v7, as a consequence of my difficulties activating its extensions.[ ] Its free version is much less capable than was the free version two decades ago.
 
SparWeb - If you go to the Indexing Options dialog:

Does it indicate that indexing is complete?
Are all the folders that need to be indexed included in the list of indexed locations?
Under "Advanced" do all the file extensions that should have content indexed have an appropriate filter? e.g. "PDF filter" not "File Properties Filter"

If those three things are complete or set correctly, I don't know why the index would be incomplete.


Doug Jenkins
Interactive Design Services
 
There's a hidden risk in terms of linking to file storage locations that everyone has access to, in that its possible to shift files around, which breaks the index (or wiki link) and leads to great reluctance to use the links.

There is also the consideration that should be given right from the outset, as to how many users will need to access the system at once. This is a great drawback to the typical Access DB, normally it gets stored as a file somewhere, and simultaneous access to the DB ends up breaking it. The better alternatives (e.g. MySQL, PostGreSQL) involve setting up some sort of server specifically to handle it, which involves much more effort, and much negotiation with IT. They come with a steep learning curve for beginners to DBs too.

Content / Document Management Systems can get around some of the initial setup of the DB, and can negate the issues of files shifting around providing they are set up to be the only means of locating the particular file. They don't work for every content type (SolidWorks / Inventor / SolidEdge are good examples of ones that don't) unless specifically configured to manage them, and can often come with their own issues, particularly if not set up properly. They will generally index far faster than a Windows Explorer interface though, and can allow for searching via more metadata than is exposed to the filesystem.

Having used a number of different configurations (user editable wikis, Sharepoint, Document Management Systems, and good old Windows Explorer) I am aware of a lot of the pitfalls of each system. Locking down file share systems greatly limits the potential damage due to cryptolocker issues too.



EDMS Australia
 
More good ideas, thank you.
Freddy, and Doug, I will not ask for support from IT, given that they seem to be understaffed already.
Currently, I am trying out ideas on my home computer. It's a pretty good sandbox for testing, because I have a bit of a library of my own.
My home computer is Win7, the ones at work are Win10. That may make a difference in how the file indexing service works.
At work, we do have content management for our approved and controlled design data (Autodesk Vault). The issue at hand is for data not controlled by this system, and probably shouldn't be controlled by it either. Since the Vault is currently only configured to manage Autodesk and MS Office data files, I would be imposing more IT support to extend it to the zillion PDF files floating around.

Doug,
Yes, and yes, although I see no part of the dialog boxes that attempts to indicate either the status of the indexing, or whether it is complete or not. It does have a button that lets me pause it for 15 minutes. So far, I have been using Task Manager/Resource Monitor to determine if the search index process is even running. Usually it's minimal. Actually, it's funnier than that: whenever I access Resource Monitor and select the Search Protocol Host, it's usually running but not doing anything. If I leave it selected for a minute, it suddenly "gets busy"! There is a faint hope, that if I give it a few months, it will eventually index everything on my hard drive.

IRStuff,
Thank you. Immediate results, if only for the files that actually have descriptive filenames. A lot of them do, so it's a great stopgap until I can choose a content searcher.

I'm about to try UltraFileSearch and Copernic (free trials) next.

STF
 
Okay... UltraFileSearch took about 5 minutes to download and setup, needs no tweaking to work, and about 30 seconds later got results.
Can even cross-search file name terms with file content terms, but it takes longer. After 15 minutes it's searched about 16,000 files and found plenty of correct hits. [smile]
With 170,000 files to search, it may take 2 hours to finish. [hourglass] It is not using indexing, so every subsequent search through the same file contents will take just as long.
The search seems to have started with the oldest files, so the last files to be found will probably be my most recent additions.

I was thinking it might also be useful for my photo collection... but probably too slow for that, too.
 
Status
Not open for further replies.
Back
Top