Tuesday, December 10, 2002

The Problem:

As our current research has proved, Google cannot understand exactly what the user is searching for simply from a "keyword" search and
many users do not have the knowledge of search engine functionality to further define their search. What happens is
that google is returing many pages that have no relevance to the users query. As well, basing those results on a powerlaw
determined by link popularity is only making a portion of the web accessable...sites are not chosen by relevance to
the query but by the populairty of site.


Our Current Knowledge Base of Google Extracted from our Experiements and Realizations:

Ron's Weblog:

Keyword Experiment: Using Keywords Ron's weblog received a high placement within the rankings of google.
Chosen Keyword: hotdealsclub
Realization: That his weblog had nothing to do with his keyword search yet google still ranked his
page very high. This tells us that the current keyword search google implements does not provide the
user with relevant information simply based on keywords.

Basic research on how a google search functions.

Marwan's Weblog:

Research on Google and Barabasi:

Realizations:
- Google follows a rich get richer scheme and is a power law
- The importance of link popularity in a google search.
- Current form of google limits the amount of sites being seen which is directly related to the
current architecture of a google keyword search.

My Weblog:

Link Popularity Experiment: Monitored the ranking and link popularity of 4 websites under the same keyword: Fitness.
The goal was to see how link popularity affected a google keyword search and the importance it played
in the overall ranking system.
Realization: That in order for the user to receive relevant information as well as be able to access
more of the web, google must break the rich get richer scheme they employ and also move away from their
current system of a "keyword search".

Research which backs up the realization from the link popularity experiment

Connections between weblogs and our new concept of the "topic search engine" which will enhance googles
current functionality and relevance.

Solution:

Using our previous experiements and research as our knowledge base we plan to create a complex system
that could be incorporated into google to enhance their keyword search capabilities. This complex System
is called a "topic Search Engine" which will allow the "keyword search" to direct the search rather
than basing the whole search on the keywords themselves. These "topic SE's" are based on the idea of the weblog;
a weblog can be viewed as a catalog of topic specific links and information. Millions of these weblogs are connected
forming a complex system of weblogs. Our idea is to use this concept of the weblog and the weblog system to connect
millions of topic specific se's which cover all aspects of specific topics and contain links to those aspects of the topic.
When a user inputs their keywords the keywords are analysed and separated into topics, the corresponding topic se's will then
look at the words and make connections amongst themselves. The user may be asked questions about a topic by a topic se to the the
engine a better understanding of what the user is looking for. After the all topic se's understand the query they then sort through
their catalogs of topic specific links and make matches to the query. The user is then presented with links that will match the
intended query of their keyword search. The advantage these topic se's contribute to the google system is help google understand
what the user is REALLY searching for. This in turn will provide the user with very RELEVANT information as well as open up more
of the web since the engine would then be based on relevance not popularity.

Goal for the end of the module:

To create a map of how this complex system of Topic SE's will work and how they will fit into the current architecture of a
google keyword search. This will provide us with a visual understanding of our proposed solution to our identified problem.

Sunday, December 08, 2002

"Putting it Together: Google loves Weblogs

Weblogs are perfect for Google: frequently updated websites crammed chockfull of tasty links. It's no wonder that Google loves Weblogs so much.

Of course, if that's the case, why doesn't every Google search land the searcher on a blog? That question underscores a crucial point about weblogs and Google: weblogs are the voters in this political system. In other words, weblogs don't get elected by Google... but the sites they voted for do.

So even if you never visit a blog, you're being influenced by them. The collective votes of the weblog community are determing what sites you see on Google, the world's largest search engine."


My last post discussed about moving to topic searching rather than keywords, here its clear that weblogs can act as a tool that enhances google and actually makes this shift happen. Weblogs are generally related to a specific topic, the links and information located on a weblog must also relate to the topic in some way or another. Then weblogs can act as sort of a search engine themselves...but what if they actually were search engines? Weblogs are a collection of related links to a topic...google can search them by these topics and the links connected to them...if "topic based search engines" were created that were as focused as weblogs about certain topics and they were as detailed and used as a transition from google to your final search your search would be topic dependent rather than keyword and link dependent. So....I am going map characteristics of weblogs onto "topic search engines" that google can use in transtion from the original keyword search to the final presentation of links. Right now im not sure how to structure this system as a total search but i know where this idea fits into the whole search picture! I'll keep ya posted...for now use the concept of the weblog to envision how a topic search engine might function.

Possible Example of Functionality:

After keywords are searched on google a topic SE or multiple topic SE's would be contacted to assist the search. These topic SE would know all possible scenario's of how those keywords could be used, for example, if the user inputted "car" as a keyword the topic SE would know that cars can be bought, sold, traded, that they can be new and old ect...just basically everything about cars. The user may then be presented with another window which would help them get a more relevant search by asking a question or questions about the keywords or presenting those keyword based scenario's. Depending on how many keywords are inputted and how many of those keywords are considered separate topics would determine the amount of topic SE's involved. If more that one is involved both topic SE's would be in dialog with eachother both developing possible search scenario's. After the search has been further defined the topic SE would have all relevant sites catagorized by those scenario's and could then really deliver relevant sites to the user. By getting this specifc with the user's search the use of link popularity would not be applicable because one of main purposes behind that idea is the site must have lots of good information about a topic because many other sites about a topic are linked to it. With this method, keywords are considered as topics and like the weblog are truely focused and knowledgable about those topics. Having only one SE to match queries to keywords contained in every site in the world is a little too general to say the least.

What's the toughest part of improving searching?

"I think the hardest issue is determining what the user really wants, figuring out, when someone types in "car," whether he wants used cars. Does he want the Kelley Blue Book? Or does he want to buy a car? Understanding better what users want -- that's the hardest challenge. When a query is a little bit more specific -- take, for example, "car repair Palo Alto" -- then we can say, OK now, we sort of understand. But we're still not 100 percent sure. Does he just want different car repair places? Or does he want the one closest to his house? We do know that we should make sure not to return a page that's a report about a trip to California and then they had to have their car repaired in Palo Alto. You can try to return documents that are specifically on this topic. We're developing more sophisticated techniques to return documents that might not mention the query words, but are [still relevant to] the topic. We're getting away from just pure word matches and getting more into topics."

The last part of this quote is a very significant step in envisioning a "new" way to search the web...."We're getting away from just pure word matches and getting more into topics." The toughest part about developing an SE is returing pages that the user actually wants which simply using keywords and link popularity won't provide. As seen with my previous experiement, link popularity and link quality play a major role in how google ranks searches...however those pages that made the top 4 are not nearly as relevant nor as good as many sites I visit regularly for my fitness queries. This idea of topic searching really got me thinking about new structures that google could follow as opposed to what they are doing now. People who search on google are looking for something particular in a general sense...although their choice of keywords may not allow them to find what they are looking for. You should not have to be an expert to find what you are looking for, as well the pages returned should not be ranked according to popularity for a number of reasons. First, many people know how to manipulate the SE's meaning that a page could be retured having no interest to the users query at all. Second, just because a page is popular about a specific topic it doesn't mean that the page is what the user is looking for. The car example above is a good scenario.

Monday, December 02, 2002

A Graph Respresenting the Results from the Google Monitoring Experiment...




The numbers corresponding to each date and placement are current link popularity results from that day. As you can see fitnesslink.com had the most significant increase in google link popularity and also experienced the greatest advancement in rank over the two week period. More about our results later....
Google Results 12.01.02




Link Popularity Results 12.01.02


Google Results 11.28.02




Link Popularity Results 11.28.02


Google Results 11.25.02




Link Popularity Results 11.25.02


Google Results 11.23.02




Link Popularity Results 11.23.02


Google Results 11.20.02




Link Popularity Results 11.20.02


Google Results 11.17.02

This is the first set of results from our Google Monitoring Experiment. The goal of this experiment was to select a certain keyword, in this case I selected "fitness", and to monitor the results from a google search for the first 4 websites over two weeks. As well, I was to monitor the link popularity for the same day and analyze the overall results. The following images you see are screen shots of the results with corresponding dates for each day over this two week period. There are 6 sets of results in total.




Link Popularity Results 11.17.02




Tuesday, November 12, 2002

Change in Focus

It looks like I have changed the focus of my weblog, originally I was going to have some focus on fitness and bodybuilding but as it turns out I have related most of these concepts to business in some form or another so my new focus (actually pre-existing) is business now.
A Problem With My Weblogging

Reflecting back on what I have done in this course to date causes some upset.
I think the greatest problem that I face with my learning in this course so far is using the weblogging format effectively. I know the purpose of the weblog is to collectively help build upon our understanding of the various aspects and characteristics of networks and networked culture which I think was a very creative approach to the whole idea, don’t get me wrong. The problem I have and so do many others is that the weblogs are treated like section conferences with out the discussion afterwards. We are using the weblogs to write down our understanding of the concepts but I don’t really find much integration or discussion which is valuable with online learning. I just make sure that I post about the unit concepts and some research into them that I have done and that’s all. No discussion among any other students….nothing, it’s like a journal. I think this aspect of how the weblogs are being used is very ineffective and rather boring, but that is the way I have been using it because that is how I view nature of weblogs…it feels like I am talking to myself, which is not fun or engaging. Because of this, I don’t take pride in my weblog nor do I want to make it look good. Posting in this thing feels like a chore, it takes me hours to even start because I will keep putting it off. One suggestion that I think may make a difference with this problem is for the instructor to make our weblogs more defined and structured. These things are so open and random from one weblog to the next I don’t even know if I’m doing this right, which is not a good thing two modules deep into the course. Anyway’s for now I’ll keep on doing what ever it is that I am doing and hopefully this weblog turns out ok…

P.S. I like your latest post Dale.

Thursday, October 31, 2002

Popularity Programs

A key ingredient to the power law within large networks is link popularity so I feel it is important to discuss this with regards to this weeks material.

“Link popularity is basically the number of links that point to your web site. Besides the optimization of your web pages for meta tags and search engine readiness, link popularity is considered as being a major factor for your ranking in the search engines”


Not only is it a major factor for search engine ranking, it is how scale free, power law distributed networks differentiate and rank nodes. This is a basic understanding of internet topography which plays an important role in generating traffic and having people view your site. One phenomenon that has emerged out of this network structure is the creation of popularity programs. These are small networks which all share and distribute each others links in an attempt to gain more power on the web.

Examples:
Example 1
Example 2

“The current generation of "link popularity" is much more sensitive to the context in which the site is listed. In theory, link popularity now takes into consideration:
• The theme of sites linking to yours - your site is judged by the type of site linking to yours, and whether it is relevant to the theme of your site. The more relevant sites, the better.
• How those sites link to yours - they look at the words used in the link to your site; the more relevant keyword phrases, the better.
• The link popularity of sites linking to yours - A "popular" (judged by link popularity) site linking to yours is significantly more important than unpopular sites. Thus a link from Yahoo counts for a lot.”


Today quality of links has become increasingly important among search engine rankings which in turn has a direct impact on the power law structure. Popularity programs have adapted to this.
Scale-Free, Power Law and Large Networks

Many large networks such as the World Wide Web or actually just large networks in general share common traits and similar structures when comparing these masses of linked nodes. Two commonalities that appear to be evident among many large networks are the power law and scale free structures. Within scale-free structures a few nodes have a large amount of connections or links to other nodes where as the many other nodes only have a few connections each. This is directly related to power law distribution, that a small number of large websites (in the case of the internet) gain the most links and traffic making them larger still, leading to the rich get richer phenomenon.

“Previous research has shown that this structure can be caused by a sort of perpetual rich-get-richer dynamic that says the larger a node is, the more likely it is to attract links.”

In building up these power house nodes which control the WWW, we see the incorporation of small world networks into the marketing scheme. Site promoters or web marketers trade links with one another to gain popularity within the larger network as a whole. The more links that are traded within this small network and the larger the small network is the more connection to nodes a certain node will have. This has a tremendous impact on their placement in search engines, the holy grail of internet traffic. This is where the rich get richer dynamic is clearly the motivation for many web developers.

“Nearly everyone (85% of Internet users) begins his or her Web travel at a major search engine. Millions of people visit the most popular search engines every day and when people find your pages using search engines, it is because they are looking for the type of company, service or product you offer.”

“The real value of links are that they increase your popularity rating in the major search engines that track how many external sites link to yours. The idea is that your site must be good if a whole lot of other sites link to it.”


The more links a node hosts or gains the more power it receives within the network, establishing the power law as a dominant structure of the WWW. It becomes increasingly important to understand this aspect of the internet as the WWW continues to expand and more and more nodes saturate the network. In order to stay on top of the power hierarchy today’s developers and marketers must have this understanding and insight into large networks such as this.

Thursday, October 24, 2002

"Small World" Fitness Network


Ok, well it turns out it is a small world after all....I chose the website Fitnessonline.com as the center point of my social network map. This is a website and a community that revolves around an interest of mine…fitness, I often visit this site for fitness and nutritional information; I browse through the discussion boards, read the articles and resources, and talk in the chat rooms. I traced the links from this site to 7 different websites, which was not too easy to do I may add…and as it turned out all of those websites were sites for magazine subscriptions…I was getting a little suspicious at this point. From there I did a brief trace from each one of the 7 secondary links to other sites they were linked to…nothing interesting came from that process though, keep in mind that the fitness community is very very very large and to do a proper map of all the links would be impossible. Anyway, back to the part were I was getting suspicious…as I looked through the 7 secondary sites, all of which were magazine related, I noticed a similar publication name…Weider Publications. It turns out that the site I have been interacting with is basically one big giant advertisement for Weider Publications. The information Fitnessonline provides is actually taken from the magazines that the site is linked to. I found this very interesting because I was totally not aware of this :P
"Small World" Fitness Network



Monday, October 21, 2002

Small World Networks, Kevin Bacon, and Business Pt.II

So with the “Six Degrees of Separation” and “The Random Universe” stuck in my mind how can I not be relating all of this to business and more specifically marketing? This units material has marketing written all over it, so that is what I have begun looking into…application of the “Six Degrees of Separation” and randomness within networks and its role in business and marketing. The business term, “Networking” is related to these concepts, by talking to as many “random” people as possible the chances for a beneficial reward increases with each encounter.

“Don’t pass up an opportunity to be introduced to your best client, simply because you forgot the concept of "six degrees of separation." Everyone you meet, knows somone who could do business with you and your company. Remember: a personal referal is worth a thousand sales letters.”

I’m still digging deeper into this so that is all for now, more to come soon!
Small World Networks, Kevin Bacon, and Business

I remember playing the “Kevin Bacon” game (yeah, it was at the Ozone…ok, don’t laugh…) before and thinking, “Wow, this is really cool!” never realizing the potential of that game for discussing and conceptualizing real world networks. That game, if you’ve played it before (hopefully not at the Ozone), is related to the concept of the “Small World Network” and the research of social scientist Stanley Milgram. Milgram is the man behind the famous, “Six degrees of Separation” phrase, a phrase which represents the notion that members of any large social network can be connected to each other through various relations shared in common. The links between individuals averaged roughly six, meaning that there were six people separating any two individuals in a particular social network. The “Kevin Bacon” game I mentioned before follows the notion of “six degrees of separation”, but is called, “four degrees of Kevin Bacon” because there will only be a maximum of four actors separating any one actor from Kevin Bacon…anyway….Applying this concept to digital networks the same type of association can be found except the number of links among individuals is increased to 19.

The idea of Randomness as presented in “The Random Universe” follows along a similar path than that of “Six Degrees of Separation”. They both relate to the use of channels or links within a network. One uses links to associate individuals within a network the other can spread information throughout a network. As Tim nicely provides an example of this randomness concept,

“This is a very interesting concept and I think that I ahve been familiar with it in the past. Look back at rumours. they start with one person and before long the whole school knows about it. I know that analogy from my past, but I haven't really tried to apply it to my present day life.”