How to Build a Search (SEO) Friendly Flash Website
To preface this article, I'm making the assumption that readers understand the basics of how the major search engines work. They send out spiders to find and 'crawl' websites. Essentially these spiders or 'robots' visit a site, collect and index their content and then organize that content along with the billions of other web pages in existence. When a user queries the search engine, it returns ordered results of page listings it deems most relevant to the user's search term.
In order to best position your Flash site to be found and rank high in the search results, your site should be built using a progressive enhancement approach.
HTML
Progressive enhancement involves building a site for the basic user first and then adding more site enhancements if the user's browser can handle it. Using this approach dictates building the site in HTML. At this time, HTML is the only coding language that search engines can consistently and semantically read. It is known that Google's spider 'googlebot' has started reading Flash content. However, the content it is indexing is not semantically relevant and is disjointed content chunks at best. While it is possible to optimize Flash using certain heavily weighted and important external elements (link building, <title>, URL), those techniques do not maximize the content's SEO.
Using this first, most basic level of site creation, the aim is to provide a site usable by 100% of your users. With this in mind, it's important to remember that search engines should be viewed simply as a different type of visitor to your site. There is nothing extremely complex about their behaviour or what type of information they are looking for. When serving content to the search engines, in the same way you want all your pages accessible to human visitors, you need to ensure search engines can find every single page of your site. In this context 'page' refers to a unique view of content that can be accessed by entering the domain name and file path. As well, the result of this should be the same no matter what.
Point #1:
To maximize and enhance Accessibility, Standards Compliance and Search Presence, you need to develop websites using semantic HTML.
CSS
After developing the HTML for your site, the next step is to enhance it in order to give your site a more appealing or 'prettier' look. All 'A grade' browsers support CSS but search engine robots do not. Since the CSS contains the way the content is displayed and not the content itself, the search engines don't NEED to support CSS. Although the robots do read the CSS, they are only ensuring nothing deceitful or 'SPAMMY' is going on. For example, by viewing the CSS a search engine robot can tell if there is white text on a white background or if the content is positioned off the page etc.
Point #2:
Enhance the presentation of your HTML with CSS
JavaScript
With the use of JavaScript you can add another level of enhancement. This is when it becomes important to understand the basics of search engine spider behaviour. Spiders do not run JavaScript, so any content added in this manner will not be seen by the search engine spider and therefore will not be added to the index. This is an especially important aspect to grasp. Problems arise because adding navigation items or other content using JavaScript means the search engines can never find or index that content. In the black hat SEO world, SPAMMERS and/or cloakers make use of this to feed content to the search engines that is not presented to the user. It is highly recommended to stay away from employing these techniques but to nonetheless be aware of the concept.
JavaScript can also add another level of enhancement by manipulating the DOM and having things change when users interact with on page elements such as links, buttons and other form elements.
Point #3:
JavaScript is a user enhancement but it does not work for all users. It will not work for users without JavaScript support and it will not work for search engines. A marginal percentage of users, such as those using screen readers, will have only limited JavaScript support.
Important Tip
With JavaScript turned off in your browser, ensure you can locate each and every page on your site. If there are any pages you cannot locate, it is likely the search engines cannot locate them either.
Flash
The next level of enhancement is adding a Flash presentation layer. A web visitor can only view this presentation layer if they have the Flash player plug-in installed. This is best detected through the use of JavaScript. If the user has the ability to run JavaScript and has the correct Flash player version, you can serve this user the fully enhanced site. I recommend using swfobject http://blog.deconcept.com/swfobject/. If you incorporate Flash into your site correctly, you can turn off the HTML content to your user and just present the Flash. This enables the search engines to locate all of your content and yet still display the Flash.Note:
Because the Search Engines do not run the JavaScript, they will just ignore the fact it is there and simply locate and index the content and follow the links.
Point #4:
Use JavaScript to add your final level of enhancement: Flash.
Flash Applied
In theory all of this makes great sense. In the real world however there is a very important point that must be addressed. It is essential that the content in both the HTML and the Flash is identical. If for some unique reason you must display in one and not the other, it MUST be done in Flash. If you present content to the search engines and not to the user, it is considered a form of cloaking and can get your site banned from Search Engine indexes.
It is possible to display the exact same content to all users on your site if you share the same exact file. The best way to achieve this is by using XML. Both Flash and web application servers can parse XML. With the current Flash player the Flash will be handled through the DOM but on many web application servers there are multiple ways to handle it depending on the language. For PHP it can be done through the DOM or with XSL.
To begin summing things up, we have ensured that the website will handle building it's own presentation depending on the user's browser and that the content will be the same for all users... or have we? What happens when a user does a search, an interior page is shown in the search results and they have a browser with full support and a Flash player? This user will be taken to the home page of the site in Flash. This is very bad because it is directly considered cloaking.
The way to get around this is to pass a value into the Flash that notifies the Flash what page this is and what it needs to display. This is often referred to as 'deep linking' although there may be multiple names for the technique. The best way to manage this is through the use of the # sign in the URL. The search engines will drop that from their indexing and allows you to manage the back and forward clicks in the browser window with some additional JavaScript tricks. I recommend using either the YUI History http://developer.yahoo.com/yui/history/ or History Keeper http://www.unfocus.com/projects/historykeeper/. You can see an older version of swfobject and History Keeper in action here: http://www.orange-project.com/ as well as on the first project I applied this technique myself on http://www.rolex.com. I heard Geoff Sterns was playing around with .swfAddress, and I am sure it does the same sort of thing.
In conclusion, there are undoubtedly many gotchas that will creep up, but this article covers the basics of what needs to be done and their best practice implementation methods.