December 05, 2007

Last blog entry: Course Review

Looking back this one semester of BIT 330 on web-based searching and information trapping, I want to compare the my learnings and insights which what I had expected of the course and also focus on the tools I learned about that I consider most important.

What I wanted to get out of the class was increasing my search skills to both profit in academia - e.g. writing papers - and in professional life were research is often a big part of the job, especially in early years. Given the fact, that my knowledge on searching and the tools I used before taking the class were kind of limited, I felt a real need in this area. The tools I used to utilize were mainly Google, rarely Yahoo and their respective specialty search engines such as e.g. Google Images.

Still, also within the "big two" there were many functions I was not aware of before. These include email alerts (Google Alerts, Yahoo Alerts), Yahoo Directory, Google Blogsearch and Google Scholar. So already within Google and Yahoo there is a lot to discover to help you when not only doing very basic queries.

Speaking about Google Scholar I am already with the part of the class focussing on the Deep Web and connected to that academic search engines. Even if getting to know more tools to access this invisible part of the web, this section could have been a bit more extensive, especially regarding one of my goals for the class which was facilitating search for scientific material. I only got a sneak preview on how much is out there in this part of the web to be very useful in this context and as it takes some time and support to get into the respective tools I would have appreciated deeper coverage.

However, there was a large set of tools of which I had no idea at all before and that can very much facilitate your life, especially when keeping track of a subject. I am speaking about RSS Feeds, Email Alerts and Page Monitors. In this large group there is a bunch of very useful tools I will definitely continue to use after the end of the course. The first is a feed reader, Bloglines, which helps you in aggregating your feeds and follow several topics and sources at a time without having to browse to several pages but just get condensed information. Once you start using it, you will be surprised how many sites there are out there which offer RSS feeds! In connection with RSS feeds, also Yahoo Pipes is an amazing, powerful tool to filter general RSS feeds by keywords, aggregate feeds and much more. It is amazing what it can do. Email Alerts are also a nice tool but, however, I did not like them so much as I prefer to be able to select the time of consuming the information which is possible with RSS feeds rather than getting regular mailings on a topic. Depending on your personal preferences, it might be still very useful, especially when using tools like Google Alert that offer very many advanced filtering and configuration options.

Another set of tools I want to highlight are multimedia search tools, especially image search tools. Flickr and Getty Images are sites I had heard of before but never really used when searching for images. Flickr offers an amazing variety of images and the best is: for free. Since pictures are tagged it allows to search for general terms and find related images, often better results than with the "usual suspect" Google Image Search. When willing to pay for it, i.e. mostly in a professional context, Getty Images finds you amazing, great pictures for any purpose. I was really impressed by the quality of the images they have.

Not a real "tool" in and of itself but nevertheless a big plus of the class were general search principles. Learning about some tricks and ways to tweak queries makes searching on basically all search engines far more efficient and successful. Of course I used some of the special search syntax before but there is a lot to learn in this field.

For the end, some general words on the class. I generally liked the interactive way of learning with many exercises to play around with the various tools and sites which is in my opinion better than just being presented tools and how they work. However, at some points in the semester there were simply too many of those tools without enough time to focus on all of them. During some lectures I almost felt a bit lost as so many tools are being shown that it is sometimes hard to remember on which sites you actually already signed in and got an account and which are new. In my opinion, limiting the quantity of tools and discussing them more in depth would be better.

Following this practical approach, there are also the two topics to cover for the Term Project. I think they are a nice way to get students working with the tools, however I do not see the added value of having two topics instead of just one. Putting more effort in one subject rather than doing some things twice without additional learning effect would be even better, I think.

To sum up, the class gave me a great toolkit for searching the web and I am sure to be able to search better in the future. The Course Homepage will stay in my bookmarks for sure as it provides an amazing overview and place to go when wishing to look something up on how to approach searching for a certain kind of information.

Posted by fhofmann at 05:37 PM | Comments (0)

December 03, 2007

Image search tools and methods

Today I am writing about different tools to search for images which might be more useful than just using the "standard" approach and go directly to Google. Indeed, they provide better results and access to the results and thus, should be preferred when being on the hunt for this type of multimedia. I tried searching images for both of my term projects, however I focussed more on my private topic (Uefa Euro 2008, European soccer) as my other topic (US real estate bubble) seems not to be the ideal candidate for an image search.

Google Image Search must not miss in such a comparison. As the in-class survey has shown, it is the most widely used tool for searching images at least among the BIT 330 students. The engine returned a decent quantity of results (689 for the query "uefa euro 2008) that reached from pictures of players, stadiums and officials to logos and ads. The results are presented as thumbnails, informing also about the image size, the website where it was found and the filetype. A feature I like and use very much is the filtering for different sizes, especially relevant when you need an image for e.g. a presentation and require a certain resolution. Moreover, the site supports the general Google search syntax and adds filters for size, filetype and colors. Another positive remark is that the preset for the filtering (Safe Search) is moderate and in this mode filters out any explicit adult images. Setting the level "strict" also filters explicit text. An interesting idea on Google Image Search is the "labeling" where you can attach labels to pictures, together with a randomly assigned partner, and receive points if the labels match. By this, Google tries to improve its search algorithm as the tags are used to find images. As a general site to start, Google Image Search is a very good focal point.

Getty Images is an image agency that sells images to e.g. advertizers. After entering your query you can (and have to) choose what type of images you want - creative images, editorial images and Footage. I excluded footage as I was searching for images. The quality of the images was better than for the other search engines with the quantity being also in a good region (454). Images looked really professional (e.g. action shots, portraits) and are generally available in high resolution. However, you cannot get the images without paying (only a smaller preview with a water mark). Getty Images offers many, many advanced search options surpassing the other sites, e.g. you can serach if searching for horizontal or vertical images, you can select the date of the image, sources, number of people in the picture, location and many more. Images are tagged very well so that you can also search for general terms as "health" or "wellness" and get great results. Getty is definitely the choice and state of the art when searching images in a professional context where you are willing to pay for them.

Flickr is a photo community that also relies on tags to search for photos. Individuals can upload their pictures and attach tags to ease accessibility. Quality is ok, however not as professional as with Getty, probably due to the fact that mostly amateurs use the platform without monetary incentives. Some advanced search options are available but it is rather limited. The resolution is on a medium level (rarely more than 1024 pixels on the longer side) which is still definitely enough for presentations. Conceptual pictures ("people working on a computer") are not that easy to find, many pictures are just personal snapshot. However, as pictures are available for free and there are additional community features, Flickr can always bring some unexpected pearls.

Posted by fhofmann at 06:24 PM | Comments (0)

November 14, 2007

Blog Search Site Comparison

This comparison is meant to compare different blog search sites regarding quantity, quality and overlap of search results.
In order to perform the comparison, I used a query for "behavioral finance" and compared the results delivered.

Technorati delivered 138 results for my query which sound to be a decent number, however the quality/relevance of the first 20 results (those are the sites I had a closer look on) was mediocre. Pages were not entirely unrelated to the search, but still did not really help to find out more about the topic. Moreover, there some results were double. Results referred somehow to behavioral finance, but did discuss other topics in general. There was only one very good result to give what I would expect for the query (a big picture), as it is rather unspecific and broad.

Google Blogsearch was the search site with the largest number of results - 6,206 - but the quality of the result pages was fine as well. The feature of sorting by relevance instead of date seems to deliver very good results. Around 75% of the results were useful for getting a first understanding of the topic, however not always very deep or broad but rather brief. Interestingly, Google Blogsearch delivered the only really good result Technorati had as the second result


Bloglines is the last blog search site included in my comparison and delivered 1,270 results, thus is between the other two search engines. However, quality of the results was poorest among the three search sites. E.g. one result that sounded promising ("Why smart people do stupid things with money : overcoming financial dysfunction") was just a reference to a public library that has the book.

In general there was almost no overlap, actually only one result was both found by Technorati and Google Blogsearch, the remaining ones were exclusively found by one site. This actually shows, how fragmented and difficult searching for blog entries is and that search algorithms and searched sources seem to differ largely among the different sites. Even for a simple query - without any sophisticated syntax - results differ greatly.
As a take-away, this tells that using different search sites when searching for blogs ads value, as every site gives different results. In my sample, Google Blogsearch was the site to go to first, followed by Technorati and Bloglines. Still, I guess that this might recommendation might not be generalized and differ for other queries and topics.

Posted by fhofmann at 07:27 PM | Comments (0)

October 31, 2007

RSS Feeds

In order to follow information on both my personal and my business topic I have set up a range of RSS feeds to keep track of upcoming information. Now I want to give an overview about those I found most helpful and most remarkable. One word in general upfront, I really like RSS feeds and prefer them compared to Email Alerts because I can choose when to read them, do not get my inbox crowded and find it more clear and convenient to read feeds with a feed reader such as Bloglines than reading news in form of Email Alerts. But that is just a personal preference.

The first source of RSS feeds I used was Google News, the usual search for news that allows you to create an RSS feed from the search results which than gives you all new search results for the search query as soon as they come up. The feed delivers a considerable amount of hits per day but setting up precise search queries allows to adjust the number of results. Still, once I had settled this query the feeds from Google News where really helpful and mostly relevant. Even if, of course, not every entry matches perfectly, there are hardly completely useless results. However, the quality of these feeds largely depends on the query you are setting up initially. One drawback is the multiple delivery of the same news if several news pages come up with the same press release. Unfortunately, it seems as if the Google News feed cannot filter for those.

I do not want to go into the details of all the options Yahoo Pipes offers but just write briefly about the feeds I created with it. I basically used it to aggregate several different feeds so the result is actually not influenced by Yahoo Pipes but more my the inputs you are giving it. Still, I wanted to mention it here, because I found it a useful tool if you do not want to read a bunch of feeds but have it all together in one, sorted e.g. by date or author which also allows you to see double entries faster than with reading two feeds sequentially.

For my personal topic I used the official UEFA (Union of European Football Associations) Website that offers an RSS feed dedicated to the European Soccer Championship 2008 which I subscribed to. I registered both for the German and the English version and interestingly enough, even though results were largely the same, there were also different news. There was no clear pattern, meaning there were some news published exclusively in either one or the other version. The feed covers all ongoing information very extensively focusing mostly on injuries and changes in teams and coaches. There is only few information on background information of the tournament. Still for the sporty side of the topic, it is a great resource. However, all the information can also be received by using e.g. the Google news feed, as it covers news on the UEFA website.

For my business topic among others I subscribed to the Residential Real Estate News RSS feed offered by the Real Estate Journal. Even if it is not dedicated to my specific topic, it is a great resource for general information on the topic and in addition often gave relevant results. As the number of news is not to large (usually two to three per day) it can be easily followed as a general source of information in that field. Articles are of high quality and trustworthy. To make the feed more relevant, one could use Yahoo Pipes to further filter the results by keywords.

Finally, I want to write about a blog for which I received an RSS feed, the Housing Bubble Blog. The title of the blog really sounded promising as it exactly covers my business topic. However, as news frequency was rather low (roughly two entries per week), it might help to occasionally follow the topic in longer time intervals. When attempting to follow more daily news and developments, it is not that helpful. Moreover, the entries are not really objective and tend to reflect a rather special opinion.

Posted by fhofmann at 09:22 AM | Comments (0)

October 30, 2007

Email Alerts

To keep up to date with my term project I have set up email alerts for both my personal and my business topic to follow news and information regarding these topics. I will write about the alerts I received considering the information I have received for both topics.

Yes, also Google provides an email alert service called Google Alerts. To set up the alert, I just entered a detailed search query I would also use for a usual google search so no adjustment is necessary once you know the googl search syntax that can be used one by one when setting up the alert. Google Alerts offers several types of alerts, I tried "News" and "Blogs". Other available types are Web, Comprehensive, Video and Groups. For the use of my search topics, I felt that News and Blogs would be the best. After selecting the frequency of the alert and your email adress you are set.
The alerts you get look almost like a Google search result, a nice feature that with every alert you can also see a tab with old alerts you got. The news alerts used to bring more results per day (3-10) than the blog alert (2-5) and also the relevance of the news alert I liked better.
In general, Google Alerts allows you to set up as much alerts as you like, edit them and also have alerts send to other email adresses. I also like the feature of selecting the type of alert, however configuration possibilities are not extensive.

Google Alert sounds pretty much the same than Google Alerts but is a totally different service. It is not run by Google and if you want to use it unlimited you have to pay for it. However, there is a free trial version out there that allows you to test the features but limits email alerts to a maximum of three.
When setting up the alert, you have a large variety of options and configurations to play with - much more than the Google version. Some features I liked are hiding similar and old results and excluding pages from the search. Moreover, you can enter everything you have to enter in one query with Google seperately here. This means there are lines for words to include, words not include, "OR" options, the position of the search terms (title, body, URL, links) and document types to inlude. The advantage is that you do not have to be familiar with a specific search syntax but can build your query step by step. The alert is only send when new information is available but you select the maximum frequency.
The quality of the results was okay, large in quantity but qualitywise a bit worse than Google Alerts because it searches the whole web and not just only e.g. news. For some information returned, I could not see the actual information in the preview because the few lines just contained buzz words but not summary of the news/information.

Yahoo Alerts was another email alert service I tried. It offers some predefined alerts about general topics of interest such as "Market Summary" or "Breaking News" but also allows a "Keyword News Alert" to configure keyword-based alerts. This is what I did to follow my two topics.
Yahoo Alerts allows for the least configuration as you can only select words to include and words not to include and only allows alerts for the news category, which still should be sufficient in most cases. The quality of the alerts I got then was pretty much comparable to what Google Alerts did.
One disadvantage is that you can only have alerts send to your Yahoo Mail adress and not to any random adress you want. As with Google Alerts, the number of alerts is unlimited and you can edit, delete and manage your alerts. An advantage however, is the broad selection of alert types as auctions, weather or traffic that is available.

Posted by fhofmann at 05:53 PM | Comments (0)

September 21, 2007

RSS: Searching for feeds and blogs

After creating the bloglines account and searching for feeds by the help of the tools bloglines offers itself, I went on to some other search engines, dedicated to find blogs and feeds.

First, I started on blogdigger searching for "sustainable development" and ended up with fairly poor results. I could hardly find blogs or feeds that seemed meaningful to me. Another aspect that made this tool a bit less attractive is the amount of google ads which are presented already above the actual results and also on the whole right side of the site. However, I liked the convenience of adding feeds or even a whole search to my bloglines account. For my second search on digital Canon cameras the results where better and helpful. Another nice feature blogdigger offers is finding out who links to a certain website. This means if you know a site you like and that you often use you can find out who is referring to it and thus might write some relevant staff for you.

It seems as if google has a search enginge for whatever you are searching for, so do they to especially search for blogs. Google Blogsearch is still in the beta stadium but already working decently. Interestingly, on google blogsearch there are no google ads while they have them on blogdigger. Moreover, also the blogs this engine is finding were of great relevance for the majority of my searches. A nice feature is filtering the results according to their up-to-dateness. Nevertheless, nobody is perfect, which in this case is reflected about the functionality of adding a blog to a feed aggregator - e.g. bloglines - as I did not see a direct way of adding a search result to bloglines. So I had to take a loop and go to the blog itself first before being able to add it.

The last useful tool I want to cover is Rojo, another advanced blog and feed finder. Rojo offers the option to either search blogs&news or feeds, a functionality I really liked as it helps to focus your search on what you want. Another additional function rojo has is an overview of categories on the starting page which facilitates finding blogs to fields of interes without knowing the specific topic. Blogs I found in these category were really good ones, however overall result quality was more comparable to blogdigger, so a bit behind google blogsearch but still acceptable.

So every tool has its right to exist and strengths to build on. Blogdiffer could convince with its ease in adding feeds to bloglines and the general functionality. Google blogsearch had the best results for my search and rojo finally has a great selection of blogs in some predefined, broad categories.

Posted by fhofmann at 12:10 AM | Comments (0)

September 16, 2007

Exercise: The Deep Web: Search Engine Comparison

Google: Searching with google results 1,870,000 entries, however searching for "timber industry" california helps to reduce the results 229,000. One advantage of google is, that at least the first results do not differ that greatly when not using the "" marks. Thus, also if you do not know exactly how to formulate your query, the algorithm seems to present results where "timber" and "industry" are related first. The first 10 results are already of decent relevance and mostly deal with environmental topics and the influence of the timber industry in California. Results are both websites and pdf documents. However, if you want to get a quick overview of the industry, you should focus and narrow your query as results are fairly broad. This proofed also true when searching for the automobile industry in Michigan, but for which also history was found as the first result.

Yahoo Directory: A first and obvious difference is the number of results: 154 and only 17 when putting timber industry into quotation marks. This of course faciliates screening the results a lot. In addition, the engine shows categories which are related to the query which might help to get the big picture and information abput related subjects, but was, however, not that useful for my queries as the categories did not offer relevant information to me. Results are again mostly related to environmental issues. One result was only a general business directory for the US which was not really related to the query. The overall quality of the results for this query is in my opinion below average. The results for my second query where better and of more relevance for me.

Yahoo: The quantity of results is very much comparable than google (2,930,000 without and 331,000 with quotation marks). In addition, also the websites themselves are mostly the same. Results are again mostly on environmental topics, getting a broader overview requires either time or a more specific query. One negative aspect was a gift shop website within the first ten results. I do not want to get this result when searching for an industry (no sponsored link, a normal result).

Scirus: Scirus returns a fewer results than google and yahoo (110,000 without and 10,000 with quotation marks) but quality is very good. Links appear scientific, websites providing an overview on e.g. the history are among the Top-10. It seems as if Scirus accesses different sources than the "usual suspects", searching in Journal sources, Preferred Web Sources and Other Web Sources. Positive feature, the searcher can decide which of these categories to include for this search. Another nice feature is a list of other keywords one could search for in order to get related information.

Google Scholar: This engine returns 27,400 pages without and 3,000 pages when using quotation marks. The aim of google scholar is to provide an easy access to the Deep Web, but if the results returned are the best the Deep Web has to offer than I would see no need in accessing it. In other words: results are of poor quality. Already among the first 10 websites there are some who are not related to the Californian but the Russian timber industry. Still, I think google scholar can be worth a try as it also searches in sources, normal search engines cannot access. Sometimes some additional information is maybe found here.

UM Library’s Search Tools: The library search tool gives access to various databases and sources but getting a specific document takes time. Most results are rather old. In addition, you cannot display results directly but really have to work to get them, sometimes you only see where to get or order them. An advantage is the possibilty to focus your search on a specific field of study such as "Arts" or "Business + Economics". However, if you are not sure which is the best field or there is more than one related field, you have to perform several queries. To avoid this, there is the possibilty of the power search. In general a good idea - you can precisely select in which sources you want to seatch - it is very inconvenient to use as you have to select source by source from dozens of possibilites. If you know how to use and where to search maybe good, still for my purpose in the exercise not so strong.

Complete Planet: The search does not really keep the promise of the name: No results!!! In addition it is very slow. No value at all.

Even if not searching in the Deep Web, when looking of the results of this exercise, google and yahoo are in my opinion a good place to start a query. They find a lot of relevant information, even if it is a good idea to focus the query to increase the quality of the results. Scirus, however, proved to be a very good complement and returns relevant and profound information from different sources. Probably, the UM library database is the most powerful search engine, regarding quality and reliability of results, but requires some effort to get there.

Posted by fhofmann at 04:09 PM | Comments (0)

September 05, 2007

What do you want to get out of this class?

I hope taking this class will enable me to profit both in my academic life, e.g. writing the Bachelor's thesis, and in my future professional life. I think finding reliable information quick is, especially in the first years of working, facilitating things a lot. Additionally, one thing I would like to learn more about, is how to get into a completely new subject or industry rather fast.

As more and more information is available and grows without control, I would like to get to know concepts of how to filter this information overload and find relevant information.

Posted by fhofmann at 03:47 PM | Comments (0)

First Test

Just testing if blogging works.

Posted by fhofmann at 03:38 PM | Comments (0)