Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations KootK on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

Structural PDF Document Filing System

Status
Not open for further replies.

KootK

Structural
Oct 16, 2001
18,085
Every few months, someone here asks me about my PDF filing system (Link). I've been contemplating posting a cleared out, open source version of the MS Access database here for other people to use or improve. I'm not looking to show off or anything. I've just found my system to be incredibly useful to me and when I think about how I could best make a meaningful contribution around here, that's probably it.

The system allows for:

1) Semi-automated filing.
2) Customizable tagging.
3) Very powerful searching.
4) Push button display of PDF docs.
5) Retrieval independent of doc location.
6) Pseudo integration with cloud storage.
7) I provide limited customization for a few old friends that use it.

The tough part of sharing the database is that there's no user's manual. And I'll never get around to writing one. If I had a critical mass of members here who were committed to giving it a go, I thought that I could just post the file here with some incomplete instructions and, together, we could fill in the blanks via question and answer to turn the thread into a living user's manual of sorts. This would represent a fair bit of work for me, however, so I'd like to get buy in from at least 7-10 members before I embark upon the project. From past discussions, I know that sponton and dcarr are already on board.

As a beta tester, the commitment that I would expect from you would be two fold:

1) You would need access to MS Access 2010 or later.
2) You would need to make an earnest attempt to use the system to organize your PDF library.
3) If you make a nifty improvement of your own, share it with the gang.

If you're interested in collaborating with me on this project, please let me know.

Capture_ae4rlr.jpg


I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
Replies continue below

Recommended for you

Have you ever used Copernic and how would you say your database compares?...Not to rain on your parade but I'm hesitant to leave a setup that works so well already without any effort on my part. Thoughts?

No worries. I wouldn't propose that anyone ditch what works for them unless they see a meaningful benefit. Heck if your system's better, maybe I'll switch. I've tinkered with Copernic and permutations of it. At the recommendation of another member here (can't remember the handle) I've also looked into the potential of setting things up as a wiki which shows a ton of promise.

Here are my thoughts:

1) OCR can sometimes be a problem for me as I am a collect historical documents in addition to contemporary ones.

2) A search/index system may well make more sense when an entire office is managing a shared PDF collection. My system is kind of built around a single user (me). To be used in a collaborative way, I imagine that the team would need to agree on a few protocols and what a logical system of tags would be.

3) My system does not search within documents themselves. When I feel the need for that, I use other utilities.

4) My system renames document in a logical fashion according to the information recorded by the user. I find that this make things easier to find even when I'm just searching dropbox from a remote computer.

5) This will be the key point and, frankly, will be difficult to explain. Extensive search/index doesn't seem to get me to the particular document that I want fast enough or reliably enough. Sometimes it's a terminology thing. I search for rebar hooks and the article that I really want is AU and calls them cogs. The larger problem, for me, seems to be the retrieval of too much information. My collection is enormous.

I set up tags in my system in a way that is meaningful to me. I'll tag a document according to not just what it contains but, additionally, what it contains that I found particularly useful about the document. I've also created a search function that will search all of the database fields including the "notes" field. Because of that, I make very strategic use of the notes field. I'll it use to:

- Type in the name of the person who led me to the document.
- Record the alternate versions of the name of the document (NIST 1257 vs Raft Foundations etc).
- I might indicate that I found the document posted here on eng-tips.
- Most importantly, I'll include a few terms that reflect why I thought the document was important. And I'll chose those terms with future searching mind.

As an example, I might filter by the tags "concrete" and "shear wall". This will get me from 12,000 docs to 120 docs. Then I'll filter by the tag "seismic" and be down to 20 docs. Then I'll search or visually scan the list of twenty and see that one of the docs was given to me by CELinOttawa on Eng-tips back in 2006. And since getting it from CEL was probably what I remembered, I'll have what I came for.

In summary, my system works nicely in concert with the search/index system that resides between my ears. And that seems to be what makes it powerful for me.

I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
@Ingenuity: glad to have you on board. Depending on your skill with Access, or your comfort in sharing your records with me, we could probably retain your personal Dewy decimal system.

I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
@Koot

Great explanation. The tagging is what is so attractive about your system. I'm honestly on the fence as I too work alone and often have an idea in mind of what I'm looking for ahead of time. I'll have to mull this over and see if which of these approaches, or maybe both, that I'll pursue.
 
The only thing putting me off something like this is the discipline required to enter everything in the first place and then maintain it, i'm the kind of person who has 5 copies of the same thing in 5 locations on 5 computers, I can always find it (eventually!), possibly due to all the copies though!

If there was some way of scanning a directory and grabbing PDF title or previewing the file, and then giving a choice of tags from a predefined list to quickly initially enter a couple of thousand documents it would prove very handy to coarsely screen everything. Documents being used on a more regular basis could have their entries refined over time.

If I had to open every PDF, get a feel for what its about, and then manually type titles, authors, tags etc to enter it in the database I can see i'm going to give up before I even start for the sake of my sanity ;)


Years ago I committed to organising my MP3 music collection, took me months to categorise, rename, reorganise, centralise etc. But once it was up and running I just needed to ensure whenever I added some new music that I went through a process to rename, tag and so forth which wasn't so painful on a 1-2 CD basis moving forward. Music collection has grown threefold since then, and I'm glad I did it looking back. I've never really though of something similar for my engineering documents, but its an interesting concept which would definitely have some benefits for someone like me who embraces a chaotic level of organisational/storage skills!

Good on you for sharing as well, more often than not we create some really good tools but hold onto them for our own use because it gives us an advantage over the next person. Look forward to any updates.

 
Just for ideas, look into the various ways AutoCAD has set up its drawing files systems.
 
Agent666 said:
If there was some way of scanning a directory and grabbing PDF title or previewing the file

I wish that there was an app that would seek out the metadata for PDF files similar to what seems to be possible for other media files. With quality metadata, I could do something similar to what you're suggesting and it would be great.

Agent666 said:
The only thing putting me off something like this is the discipline required to enter everything in the first place and then maintain it

It's a problem. I can speak to this as I still collect PDF documents at a faster rate than I file them somehow. As such, I've always got an intimidating "heap" lying around in some digital nook.

1) Much of what follows basically would basically fall under the general tag "do not let the perfect be the enemy of the good". There are few examples of this being more true.

2) Most of the PDF's that I acquire require renaming to be of any future use at all. As my system does a very nice job of automatically (and rationally) renaming, filing the documents properly really isn't much more difficult than renaming them. This is a useful "carrot" that speed progress. I just filed your SHS document a few moments ago.

3) After six years of filing this way, I've found that a document that's filed properly seems to be about five times as useful as one that is not. If it's filed, I'll make my way back to it. If it's not, I'll often loose it forever or wind up downloading it again and having duplicates.

4) The process described in #3 pretty quickly teaches one which documents are very important. After a misstep or two, I'll realize it and get those document filed which helps.

5) I'm currently collecting and storing a bunch of metadata that, after six years of filing, I can say is pretty close to useless. I never search based on publisher. I never search based on publication year. Author based searches only seem to benefit me when I'm looking for something produced by a person or company of which I have first hand knowledge (former employer etc). It's not in my DNA to leave fields unfilled but, if one could manage it, one could get by quite nicely only recording the document title and applying the appropriate tags.







I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
Yeah I take the point on renaming getting you 90% of the way, if that's automated to some degree half the battle is won, sometimes an explorer search will not turn up the obvious due to the name not reflecting the contents. 5667543335656.pdf just doesn't cut it sometimes! and windows explorer searching within PDF's themselves never seems to get me there.

My 'completely unsorted' folder contains a daunting amount of files, and my 'sort of sorted folder' contains 4,480 files at present so I have the same 'collecting' disease it seems!.... I dare say I have probably never opened let alone read a great deal of the information I have collected.

But the very fact that its there has helped myself and others I work with on many an occasion.

The thirst for knowledge - its a battle worth keeping on top of it seems, it always surprises me how little information some of my colleagues seek out and retain in a digital format. They have their go to codes and so forth, but beyond that they aren't filing away interesting tidbits for future use at all.
 
Like theonlynamenottaken I use Copernic, and have done so for longer than I care to remember.[ ] This will find Word, Excel and PDF documents for me, plus numerous other file types, based on the body content of the files.[ ] Where it falls down is with PDF "documents" that are image-based rather than text-based, since such documents contain no text content to index or search upon.[ ] KootK's approach gets around this limitation by indexing his PDFs by their tags, tags that he creates.

Inspired by this thread, I began wondering about defining my own tags for my image-based PDFs.[ ] Two possible hurdles immediately occurred to me (plus the certainty of future unknown hurdles if I decide to take it further).[ ] I did a bit of testing on these, as follows.

(1)[ ] Does Copernic include metadata tags in its indexing?[ ] I created a Word document in which I inserted some garbage words as tags.[ ] Copernic found the document based only upon a garbage word.[ ] Hopefully this will apply to PDF files as well, but this remains to be tested.

(2)[ ] Can I add (searchable) tags to a PDF file?[ ] I do not have the full Acrobat program, only the Reader.[ ] This does not allow tags to be added.[ ] Neither (under Windows 7) does Windows Explorer.[ ] Some superficial googling suggests that Full Acrobat allows the addition of tags, but again this remains to be tested. (Presumably it does, and presumably this is how Koot does his tagging.)
 
Kootk:

I would like to work with you on your database system. I have a ton of pdfs halfway indexed in folders and I add to this all the time with no real backup except my memory. I have all of my projects back to 1972 scanned or deleted as well as the drawings ect. Email organization still working with Eudora - hard to move over to Outlook. Use to use Copernic but last time I bought it, it came with a lot of unwanted problems. I do use a program called Agent Ransack which although somewhat slow, works for me. I have used Access in the past. We also have a high speed Epson scanner that will scan both sides of the document. Does really well. We also have a large Kip scanner for drawings - but it doesn't do color. I may have some other things that I can share. I have Acrobat XI Pro and use it with InDesign and Excel. I also have 15TB Drobo for my system and get on my desk top with my laptop at a remote location (like everyone else in the meeting!)
 
@oldrunner: welcome to the beta test group!

Denial said:
Can I add (searchable) tags to a PDF file?...Presumably it does, and presumably this is how Koot does his tagging

I believe it possible to add metadata tags to PDF documents but that is not presently what I'm doing. At some point down the road, however, it would be a relatively simple thing to dump all of my database tags into the metadata of the relevant documents for use with other technologies.

In my scheme, the tag system runs off of a conventional relational database structure. Two things that I like about that are:

1) No issues with misspelling tags. "Post-Tensioned Concrete" is always "Post-Tensioned Concrete" and is never confused with "Post Tensioned Concrete" or "PT Concrete" etc.

2) I've leveled off at 119 separate tags. Using a relational system, I'm able to group those into categories that each contain 8-10 tags. That way, I can get to the tags quickly. If you look closely at the bottom right of the screen capture, you'll get the idea. A view of the underlying structure is shown below.

3) The database allows each tag to potentially have a "parent tag". If you tag something "precast" it will automatically get the concrete tag as well. If you tag something "shear wall" it will automatically get tagged as "lateral" etc.

Capture_mutqpd.png



I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
Agent666 said:
and my 'sort of sorted folder' contains 4,480 files at present so I have the same 'collecting' disease it seems.

Welcome to club digi-hoarder. You may, in fact, be able to ursurp me as chairman. 4480... I kinda want to fly to NZ and ransack your 'puter now. I agree though, I harbor the same general disgust for my non-afflicted colleagues who just have the "big four" code manuals on their shelves. Might as well be accountants damn it.



I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
9600 - 22.8GB - sort. I get depressed when I look at my unsorted folders. :(
 
I too am interested in testing out your filing system. I have developed a few different project management databases so this is the type of thing that piques my interest.
 
Alright, time to give this a whirl. An empty version of the database is attached to this post. What follows will be the beginning of what will hopefully become the instruction manual. I'll be adding to it piecemeal as time allows and issues crop up.

1.0 Locating the database file and your root folder

You can put the database file anywhere that you like. All of your PDF files must be located within a single root folder that you will later specify within the database. Within the root folder, you can organize your files as you see fit. If you're like me, you've got most of your PDF files already stored in folders that represent and existing filing system (steel, concrete, seismic, etc). If that's the case, just copy all of those folders into the root folder.

On my computer, I've placed both the root folder and the database file within folders that are synchronized with my Dropbox account. This allows me to use the same system at home and at work. The Dropbox business is entirely optional however.

2.0 File naming and how the system finds your PDF files

The system will automatically rename all of your files according to the following nomenclature: [LIB] + [SPACE} + [Six Digit Number] + [User specified document title]. You can't have colons in the file name. This trips me up from time to time. Replace all colons with dashes. The system does not contain your files. Neither does it even contain links to your files.

Capture_gcj11t.png


The system knows your files simply by the six digit library tag (snippet above). When you click the "link" button to have a file displayed, the system hunts through your root folder and all of its sub folders until it finds that six digit tag. Then the system opens the associated file. While a bit odd, this setup has one huge advantage. You can reorganize your files within the root folder without ever worrying that the system will lose track of them. The system never had them tracked in the first place.

3.0 Setting the root folder

The system needs to know where you've chosen to keep your root folder. Click the root folder button, navigate to your root folder, and verify that the system has recorded it directly. You only need to do this once.

Capture_qmtvll.png


4.0 Adding your first file

The system comes with one document record already loaded (AISC 13th). The reason for this is that the system has some issues when it's entirely empty. Once you have other records entered into the system, you can delete this one. Note that the record is not associated with a PDF. That's because I use the system to log both PDF documents and hard copy documents. For me, the AISC manual is a hard copy document.

4.1 You can fetch a PDF document from anywhere but, to get the most of out of the example, start with a new PDF on your desktop.

4.2 Ideally, the file name would be less than ideal so that the system can modify it for your.

4.3 Start a new record and click the "Link" button at the end of it. A file chose dialog box will open.

4.4 Navigate to and select your file. If your file is not located within the root folder sub directory system, the system will ask you if you'd like to move it there. Say yes.

4.5 A folder chose dialog will open and allow you to navigate to the folder within the root directory where you would like to store the document.

4.6 The system will now have moved your document and renamed it [LIBXXXXXX ORIGINAL FILE NAME]. I has also guessed that you may want the title of the document to be whatever the file name was. That's usually a good place to start and sometimes saves some editing effort.

4.7 Adjust the name of the document within the system. Don't use any colons and remove ".PDF" from the name. Once you've edited the title, the system will automatically rename the file to reflect that.

4.8 Fill out the rest of the fields as you see fit and save the record. You can save the record by either moving to another record, moving to a control outside the list or, most conventionally, by clicking on the little pencil thing at the left hand side of the record.

Capture_nmcxhw.png


5.0 The Lock field

Whenever you save or leave a record, the system will automatically "lock" it. When it's locked, you can't mess with the record until you go to that record and uncheck the "lock" field. This will annoy the crap out of your initially. Trust me though, you want it this way. It keeps you from messing up the integrity of the data that you've worked hard to assemble.

Capture_fu20ie.png


6.0 The Publisher field

The combo box for this field will contain all of the publishers that you've previously entered as options. This way "John Wiley and Sons" can always be precisely "John Wiley and Sons" and never anything else.

7.0 The "Location" field

I use this field to indicate where the document is. It's mostly intended for physical documents. If it's a PDF, I chose WORK. If it's at home, HOME. If it's real book stored at work, WORK. If I loaned it to my colleague Kevin, then I type KEVIN. The choose box runs off a predefined list that you can edit as shown below. You're not limited to the list though so you can enter one off's like "KEVIN" without a bunch of overhead.

Capture_wgui0y.png


8.0 The "Tags" field

This is the good stuff. And the important stuff. We'll talk about how you set up your tag options later. For now, we'll just look at how you add them to a record.

8.1 Go to the "Tag Categories" list box and select the category that contains the tag that your interested in. "Materials" maybe. Select material and the "Tags" list will repopulate accordingly.

8.2 Find the tag that you want in the "Tags" list and double click it. It will be added to your record. As as many tags as you care to this way. Double clicking a tag that already has been added will cause that same tag to be deleted.

Capture_t5znom.png


9.0 Setting up your custom tagging system

The system comes pre-loaded with my tagging system. If your serious about this, you'll want to ditch it and spend some time coming up with your own system that work how you need it to.

9.1 Right click over the "Tag Categories" list box and, in the context menu that appears, select "Edit List Items". In the editor that pops up, set up your tag categories as you like. Note that you cannot delete a category that contains tags. You can only edit it. To delete a populated category, you need to delete or move the tags that it contains.

Capture_owcaj8.png


9.2 Right click over the "Tags" list box and, in the context menu that appears, select "Edit List Items". In the editor that pops up, set up your tags as you like. Each tag must be assigned to a pre-existing category. Optionally, each tag can have a "Parent Tag". That's best explained through example. I have a tag that is "base plates". It's parent tag is "connections". If I tag a document with "base plates", I'll get "connections" too automatically. No biggie, just a minor time saver.

Capture_jlry44.png


10.0 Searching and filtering your library

This is what you came for. Time for some cake.

10.1 Searching by tags. Go to the "Document Tags" list box at the bottom right and highlight the tag that you want to search by. Then click the "Apply Filter" button. The document list will be pared down to just the documents that have been tagged with the tag that you selected. You can continue to filter by additional tags. The filtering is cumulative. If you wanted to look at concrete shear walls in seismic zones, you might filter as follows: SHEAR WALLS --> CONCRETE --> SEISMIC. In my system, this would get me from 2000 documents to a handful in a hurry.

Capture_o5xqew.png


10.2 Searching via the "Search" button. Type something in the text box and click the magnifying glass button. The search algorithm will search all fields of all records and return the documents that contain your search term. In particular, note that the "Comments" field will be searched. This can be powerful if you enter your document comments with a strategic eye towards later retrieval. I'll it use to:

- Type in the name of the person who led me to the document.
- Record the alternate versions of the name of the document (NIST 1257 vs Raft Foundations etc).
- I might indicate that I found the document posted here on eng-tips.
- Most importantly, I'll include a few terms that reflect why I thought the document was important.

Capture_jqvyq9.png


10.3 Native Access search and filter functions. Access can search and filter in dozens of handy ways. There are too many to describe here. Right click over any field and the context menu will be pretty self explanatory.

10.4 Combination searches. You can use methods 10.1, 10.2, and 10.3 concurrently as part of the same search. For example, you could filter tags as SHEAR WALLS --> CONCRETE --> SEISMIC and then perform a search button hunt for "Portland" to further narrow the field to a document that you know was authored by the Portland Cement Association. You can use the various methods in any order.

10.5 Clearing all filters. No matter which search method you've used, you'll eventually want to get back to the full list of documents. To do this, click the "Clear Filters" button at the bottom right.

Capture_v5xnad.png


10.6 Demonstration. In my next post, I'll upload a version of the system that contains a couple thousand records. Use this to futz around with the search functions and see if this is something that would work for you. Obviously, you won't actually have all of the linked PDF files so you won't be able to access those.

11.0 Retrieving your files for viewing

Click on the "Link" button beside the file that you would like to view. The system will track it down and display it using your computers default PDF viewing software.

Capture_pwn2hp.png


I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
As I mentioned at the end of my last post, here's a version of the database that contains a couple thousand records. Use this to futz around with the search functions and see if this is something that would work for you. Obviously, you won't actually have all of the linked PDF files so you won't be able to access those. Don't anybody get excited about copyright issues. No docs included.

Note that this is not the version of the database that you should use to start your own library. Instead use the cleaned out version that I attached to my last post (instructions).

I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
 http://files.engineering.com/getfile.aspx?folder=a1a03936-d077-4538-988a-f405d24ae505&file=Structural_Library_-_SEARCH_DEMO.accdb
IRStuff said:
Might I suggest posting this as an FAQ? That way, it can always be found.

Great idea. First I'll need to figure out the technology though. Right now, it's time for Dalwhinnie and sleep...

I like to debate structural engineering theory -- a lot. If I challenge you on something, know that I'm doing so because I respect your opinion enough to either change it or adopt it.
 
You may want to setup a Google Doc, so that people can add into that so it will compile as you go. Make sure it is restricted so that only verified emails can edit.
 
I have managed to find a free program that allows its user to change (at least some of the) metadata tags in a PDF file.[ ] The program is "PDF-Exchange Viewer", and it can be downloaded from

I have tested this with the standard "Title", "Author", "Subject" and "Keywords" tags, and it works for all four.[ ] In all cases the entered tags are indexed by Copernic and the documents are findable through Copernic.[ ] The tags are also visible via File>Properties>Details in WindowsExplorer.[ ] I have not yet tested any other tags.

This may be of relevance to your endeavours.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor