Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

Hard Drive File Search Tools 3

Status
Not open for further replies.

Sparweb

Aerospace
May 21, 2003
5,103
0
36
CA
At work we have a library of reference data, which all of our 30+ engineers can access and refer to as they work.
It contains design manuals, regulatory guidance, vendor datasheets, industry standards, etc. The library has the following approximate dimensions:
120 GB, 170,000 files, 7200 folders
It grows significantly every year.
I have found that newer members of our engineering staff, not familiar with our knowledge base, have trouble using it.
For our younger engineers, it is a matter of believing the information exists and/or that we could possible know something that Google doesn't.
For our older engineers, like me, when things are stored on a path I wouldn't expect to follow, I have trouble finding other people's additions.

There was a time that Google Toolbar could be used to search and efficiently find information in the library, but GT has gone by the wayside.
To find things now, we rely on a rough system of organization... and the horrible windows search bar above the folder viewer.

I've been looking at file search tools and found some candidates - but I thought I'd ask if Eng-Tips members have any recommendations before I commit?
"Copernic" and "UltraFileSearch" are the two contenders I like.


STF
 
Replies continue below

Recommended for you

Cool, two votes for "Everything" now. It only searches file titles, not contents, but it's very fast. I already like it a lot for finding files which have a descriptive name. Reinforcing my habit.

I let UltraFileSearch run overnight with a file content search parameter, and received all of the expected results. It also turned up some results that I didn't expect, all of which contained material on the desired subject, which is EXACTLY the point, so I'm pretty happy with it. [smile] The search did take nearly 2 hours as I predicted.

STF
 
Consider too, that a single machine indexing files might well be a bit different to 30+ people all attempting to index and search a shared folder arrangement. This is also the implication of getting IT involved, as often they're the ones that need to support whatever arrangement is in place, and are really good at putting roadblocks in if they aren't consulted.

It would appear to me, that for 30+ engineers, the file indexing tool you're testing would need to be installed on each machine in order to work the way you're expecting. That would likely have licencing implications, as well as the possibility of increasing network traffic and inadvertently slowing down the file access.



EDMS Australia
 
FreddyNurk - all software that controls files uses links. I haven't broken a wiki link in the now decade of use; why would anyone want to do that? For links between pages the software updates the links when the pages are renamed. Since files within the wiki cannot be renamed, there isn't a way to lose links to them, the way an Indexer for the OS will. Same goes for Autodesk vault, PTC Windchill, and hundreds of other applications that use a database front-end to manage documents.
 
3DDave, the context is typically things like wikis, or Sharepoint, or similar implementations that link directly to filesystem shares (given the context we're talking about is indexing file shares). People drag and drop files across a file system, rename the filename (e.g. from FILERev2 to FILERev3), delete files and so on, and the links break. Wiki links that reference other pages in a wiki are a different context.

Your examples of Vault is exactly what I was talking about when I mentioned being the only means of locating a file, if a user can't access the file storage, then they can't break the link.

EDMS Australia
 
Since the files are controlled by the Wiki (because the Wiki has a file vault,) there isn't anyone dragging or renaming anything. Wikis typically don't refer to network locations or filesystem shares. That would defeat the point of a wiki. I am not certain how I would get the Wiki I use to do so.
 
We store (typically MS) documents in Sharepoint as that's what management and IT like and think is the solution to information capture & sharing. But once you've added a doc to that system, you might as well say goodbye to it.

Fortunately we also have a wiki that allows us to embed URLs to those Sharepoint docs. So the big MS docs aren't actually lost. We effectively hand-index them in wiki pages that mention them in context.

Steve
 
At the risk of labouring the point when no-one else was interested:

Reports that Windows search didn't index files that hadn't been opened, I did some checks on my system (Windows 10). I copied a .doc file to an un-indexed location, made some changes, then saved it with a new name as a .doc and .pdf, copied the new files to the original folder, observed the index update process, and did some searches. What I found was:

1. Reindexing the new files (about 5MB each) took about a second (see screen-shot below).
2. Searching for old content found it in all 3.
3. Searching for words only in the new files found it in both of them, even though neither had been open in an indexed location.
4. It looks like only single words are indexed. If you enter a phrase it will find all files containing any of the words in the phrase, but it doesn't seem to work for an exact phrase in "".
5. It indexes personal names, but not non-dictionary words. I have no idea how comprehensive its name list is.
6. You can sort the list by relevance, which seemed to work pretty well.
7. If it finds files with the search term it updates almost immediately, then continues searching; I presume in non-indexed folders.

I do recall being just as unimpressed with Windows search at one time as others here, but I think more recent versions are vastly improved. I don't recall if there was a big change between Windows 7 and 8, but I think the 8 version worked much as in 10.

Regarding monitoring the indexing, the Indexing Options dialog now looks as shown below. If you copy new files to an indexed folder it briefly notifies that it is indexing, then shows the revised number of indexed items.

indexing.png




Doug Jenkins
Interactive Design Services
 
3DDave said:
Since the files are controlled by the Wiki (because the Wiki has a file vault,)

Oh, now I see. When you first suggested it, I thought the wiki operated like some kind of interface that writes the HTML code for me (like Wordpress or Frontpage) but just in the background. So I did not believe it could deal with the file organization. This isn't what you're talking about at all. I was already a little daunted by all the page writing to make a readable Wiki, but now I realize that this would definitely require IT support, in the creation of a separate server online for the users to access. Eventually it would randomize all of the semi-ordered file structures currently in place. Access to the data would require the index (though admittedly the index seems robust).

It might help to illustrate the file structure I currently use, which I've talked about but maybe haven't made clear to everyone:
For example, to find a PDF copy of Peterson's Stress Concentration Factors,
[ul][li]..\Engineering\Textbooks\Peterson\Peterson - Stress Concentration Factors (1st Ed).pdf[/li][/ul]
I don't really need to do a search for "Peterson" since I can just click on the textbooks folder, and pick the right book.

Another; ESDU Paper 68002, SHAFTS WITH INTERFERENCE-FIT COLLARS,
[ul][li]..\Engineering\Design_Manuals\ESDU Papers\ESDU_by_Subject\Mech Systems\68002.pdf[/li][/ul]
This is a bit more of a challenge, since the ESDU papers themselves are not sorted in any particular order, nor given any useful names, except the files I have renamed. So I made the index easy to find first, and from there all other subjects and titles within the document numbers can be found.

Hopefully, this give an idea that the documents have been organized in some fashion, even if calling it "curated" would be an exaggeration. A user can select a general subject, and then narrow it down.

Some might find my insistence on referring to DOS computer folders to be an obstacle. I may have to concede that using DOS file structure to organize documents whose subjects are interwoven is both old-fashioned and difficult for noobies to absorb. In my defense, my system of organization is modeled after two other similar reference file libraries that I was able to learn, and use efficiently on the machines of others. I later copied much of this system of organization for my own purposes.


IDS said:
At the risk of labouring the point...
Hi Doug,
No, your persistence is greatly appreciated!
I accepted long ago that I will be using Windows for maybe the rest of my life, so it behooves me to learn to use it as effectively as possible.
Here's a screen-shot to match yours:

indexing_i41lph.png


Sorry, I didn't notice the status text at the top of the window, not until you posted your screen-capture.
"Indexing speed is reduced due to user activity."

Searching this subject turned up these links:
[ul][li][/li]
[li][/li][/ul]

Which suggested this registry edit:
[ul][li]HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Search\Gathering Manager[/li]
[li]DisableBackoff = 1[/li][/ul]

The service (or the computer) has to be restarted after the registry change.
Later comments on the Microsoft forum recommend turning it back OFF again once it's done, or it could unnecessarily slow the whole system down.
I'll give this a try, but it means I have to hit "Submit" before I reboot.

STF
 
Okay.
"Indexing in progress..." Better than before.

To do this, the Regedit didn't work. Not sure what was really wrong but I suspect the HKEY was the WRONG key to edit for my version Windows 7.
Instead, after a bit more searching the 'net I found this: The indexing service should be stopped while the Group Policy setting is changed, then restarted for the change to take effect without a windows re-boot.

The indexing service is now reading through about a MB per second so maybe I will have access to better search results soon.


STF
 
3DDave, its trivial for users to do what you don't expect with a wiki. If linking is as simple as using square brackets, then people will cut and paste links to file systems in it (i.e. G:\Somestuffhere\MoreStuff\Myfile.docx), its then up to both the wiki parsing the link and the underlying operating system as to whether it does something meaningful or not. Why go to all the effort of loading a file in the wiki when they can just copy the link?

It appears that we agree though, whatever system is in place needs to control the location and storage of the file, rather than a shared folder structure that whatever system links to. A lot of document management systems provide the (unrestricted) access to storage structures, as that is what users demand, rather than what is best for the system.

SparWeb, this might well be relevant to your testing of indexing, at least in terms of shared drives. Bear in mind there is also the possibility of having fileshares available from machines running something other than Windows.

EDMS Australia
 
FreddyNurk - no. It isn't. I guess when I wrote that I have no method to add a link to an external file location that you thought I was lying, so you made up an example. Or maybe you use a particular wiki that for no good reason allows it - what's the name of the wiki software that you are using?

What happens with TikiWiki is it tries to create a page with that path as a name. I expect that MediaWiki (which is what Wikipedia is running) also doesn't allow users to create such links because, wait for it, wikis are run through browsers and their users will be on different computers which won't have the same local files, making such links just plain garbage. Which is why the wikis I know about have managed file storage and link to those or to other universally accessible locations.

SparWeb - the wiki pages can have exactly the same hierarchy as you have now, but they can also represent any number of others as well, at the same time. I bet you have textbooks and procedures and papers in 200 different folders on the subject of heat treating; they could also be on one page about heat treating, grouped by material if you like. Or referenced on individual alloy pages as appropriate.

The problem for the new guys is the library has no card catalog and they didn't create the gross filing system, so they don't know where to look. The substitute for a card catalog isn't a list of all the words in all the books with the book title and shelf number next to them; at the very least because sometimes the pertinent words aren't even in the book because they used a synonym. Even the classifications (Dewey Decimal or Library of Congress) often used in public libraries stink and librarians know it; too many books should be filed into multiple categories, but cannot because a physical book can only appear in one place, so librarians make do. In addition, when a new category needs to be created, often it has to be interpolated. Ultimately all of them have failed, forcing books to be mis-categorized.
 
IDS said:
I do recall being just as unimpressed with Windows search at one time as others here, but I think more recent versions are vastly improved.

Bingo. The change to Group Policy fixed my problem.
To be specific about what I did for my Windows 7 system, rather than rely on an external link:
[ul]
[li]Windows Start, Run (or Win+R)[/li]
[li]type "gpedit.msc"[/li]
[li]Choose: Computer Configuration\Administrative Templates\Windows Components\Search[/li]
[li]Select "Disable indexer backoff", and set it to Enabled[/li]
[li]Windows Start, Run (or Win+R)[/li]
[li]Type "services.msc"[/li]
[li]Scroll to "Windows Search"[/li]
[li]Right click and select "Restart"[/li]
[/ul]

I left the indexing to run overnight, and the results have improved remarkably. The search term "fracture toughness" turned up hundreds of documents. It also reveals how often I have duplicated or triplicated many documents, storing multiple copies in multiple subject folders. This avoids the Dewey/LC limitation that 3DDave mentioned above, and clearly I haven't handcuffed myself that way.

The indexing is still going, and in the time it's taken to write this, the "fracture toughness" search window has turned up two more items.
If I find that the indexing is slowing the system down later, this is fairly easy to toggle back to Disabled. Maybe toggle it back from time to time or if I make big additions to the library.

FreddyNurk said:
...this might well be relevant to your testing...
Freddy, that will help a lot when I start to make this available to coworkers, thank you! As I mentioned before, I'm in "the sandbox" doing this on my home computer's library of stuff, but clearly my next step at the office will be to allow network locations to be indexed, too.

STF
 
Dik:
In Linux, the 'find' command will find files by title or part.
Like all 'simple' Linux commands, it has a bazillion options, only a few of which will be useful to you. Some options can be dangerous, nonintuitive or both, so read 'man find' carefully. In a hurry, skip way down to the examples and explanations.

I don't think find does any background pre-indexing, but it appears to do some cacheing, so its speed improves with use (as you experiment with options). ... and I think _some_ Linux filesystems (there are many) do indexing or something. Find is generally pretty fast.


To search by content, you can use 'cat' piped to 'grep'. Again, many options, many behaviors, not always what you expect.


I know of no Linux equivalent to Agent Ransack, which is one of the first external programs I install on any Windows system I have to use.
One of the first programs I install on a Linux system is Midnight Commander, a text-based file manager, often known as just 'mc'.






Mike Halloran
Pembroke Pines, FL, USA
 
I suggested Linux threw Linux out because it's file management exceeds Windows... I have Midnight Commander... excellent manager.. and just searching for most recent and found that there is a Widows version of the program.

Dik
 
Personally I HATE it when my computer is spending 95% of its time indexing and 5% doing what I ask. Bazillions of throwaway temporary files (re)searched every time I move my mouse. No wonder people spend their money on mega-multi-core machines these days. You need all those cores just to try to keep the indexing under control.

[end of rant]

Steve
 
SomptingGuy,
It is possible that sometime in this computer's life, I may have put a stop to constant file indexing, thus causing the very problem I now need to solve.

It's still indexing, BTW, at high priority, and it's over 425,000 files. Jeez will it ever stop? So far I haven't needed to do anything very demanding (spending all my time on Eng-Tips!) so I haven't been tempted to make it back off yet.



STF
 
I have found that an indexer that indexes content is dreadfully slow and very intrusive unless it is on a server and managed carefully. I have taken the position that file names simply have to be descriptive and use Everything. My searches are not always fruitfull...
 
Status
Not open for further replies.
Back
Top