Archive for the Stuff Category

Efficient Rendering, A La Mark.

Rendering efficiently is one of those topics that is widely spoken about in the world of 3D graphics. Asking a question like ‘What is the best way to render a bunch of Objects’ is as open ended as asking ‘What is the best way to cook chicken soup.’ It is all based on application and preference and in all likelihood, there is no universal answer to this question. However, there are a series of specific solutions to this problem that can help in creating a mechanism that is best for the particular situation.

My problem is rather generic and will require a generic solution. I have a bunch of objects that need to be sorted by certain criteria in order to minimize state changes. It has to also support Shaders (Cg in my case) and it should minimize Shader state change between object rendering. Furthermore, an object must be generic enough to support complex models with bones and animations. On top of that, it should be easy to use. To start, we might need to break this down into smaller parts.

Objects:
For the time being, lets refer to an Object as a list of vertices inside a vertex buffer. It may or may not be accompanied by an index buffer, but in most cases it will. This Object will be shuffled to the graphics card to be rendered for each Object that exists in our world. This is inevitable until we support something complex like hardware based instancing.

State Changes:
Unless you want all objects to be rendered in the same way, in the same spot, and with the same vertices, you probably want some sort of state change. A state change is a change in any part of the system, whether it is the position of the camera, a new Object to be drawn, or a new effect. To quantify a state change, it is best to organize it into the types of state changes: swapping Render Targets, Shaders, Technique, Shader parameters, and using a different vertex or index buffer to draw an object. The order of the state changes, as listed above, matters because the changes at the beginning of the list are the most expensive and the changes at the end of the list are least expensive.

The RenderGraphNode:
This is a generic interface to which many types of nodes will be derived from. Each derivation of the node will be the embodiment of the changes listed above. In addition to the state change, the node will also be a container for child nodes. Usually the node will be generic enough to contain any type of node. However, in our case, we want to preserve an order to our nodes so that we optimize the state changes. The root of our tree will be a change in Render Targets. For the most part, there will only be one Render Target, the backbuffer (our screen). When a child node is added, it will automatically be sorted into the correct place in order to minimize the state change. This is especially important for Shader parameter changes because there can be multiple parameters in one Shader.

The RenderGraph:
In order to encapsulate all this, I need a class that will be the owner of Render Targets. It will be the only thing passed into the Renderer for drawing. At that point it will traverse the tree and render.

Sounds simple right? Yeah, but something doesn’t feel good about this design.

If we leave the design at this point, we are left with a bunch of nodes in which the user has to put together. This design is acceptable by some. In fact, OpenSceneGraph uses such a design for its SceneGraph. It is a bunch of classes that fit together in a tree fashion. Throw in a Visitor pattern into the mix for easy iteration and you have an engine. I’m not quite as happy with that design as my OpenSceneGraph counterparts are. The problem is, in my eyes, that it’s very verbose. Putting together a simple scene with an airplane in it was quite lengthy. You have to add a GeometryNode to a TechniqueNode to a ShaderNode to a RenderingTarget, and so on.

So back to my original question, what is the best way to implement something like this? When I figure it out, I’ll write about it.

Blog++

I’m converting my blog to something a bit more useful. My long rants about my game engine were all leading towards a game of some sort. In the process I have recruited a friend to help me realize that dream. So, give a kind welcome to Alex.

Our first title will be a strategy turned-based war game by the name of ‘The Mortal Realm.’ It will feature my 3D engine and a robust battle system. As far as complexity goes, this game is one of the simplest we have come up with. It’s a simple point and click style of game with very minimal artwork. I’m hoping it will be a great test bed for my engine as well as Alex’s AI.

Too Busy…

I’ve been pretty bad at keeping my blog up to date as of late. With my wedding just around the corner, most of my time has been wasted planning things and what not. I found myself wishing for an extra hour or two a day just so I could fit in all the stuff that I like doing, such as programming and reading books. But, alas, it is an important part of life that requires a great deal of effort. And there is really nothing I can do about it (perhaps eloping would be easier) - everything, and I mean everything costs money.

I’m nearing my one year anniversary at McAfee (amongst other things) and finally I am at a comfortable stage in my life. I have a stable job which does not work me to the bone. I couldn’t be happier with it. But, my financial situation is inching closer and closer to the red line, all because my expenses as of late have been sucking my savings up. One of which was the purchase of a new house in Hamilton. So, I’m looking for a way to get a few more bucks. I started looking into something that has interested in for quite some time - Teaching.

Hamilton is home to Mohawk College. I attended this school before going to Seneca College and found it a decent place to learn. Though, after learning of what they had to offer I opted out for the Computer Programming stream at Seneca. What interests me is not the full-time position but night school, part-time classes. A teacher at Seneca told me that Ontario requires its colleges to have a certain amount of professors with a Masters degree or higher. But only for day school and only in their core streams. Night school, however, does not have that requirement but I bet having a Masters looks good when applying for a teaching position. My Bachelors might be beaten out of a teaching position by someone with more educational experience. So I’m hoping my experience working will carry some weight.

On Mohawk’s webpage they specify a few courses that interst me (if they are available in night school). These are ‘Introduction to C,’ ‘Object Oriented Design,’ and so on. Being first year courses, I fear that they might not be available in night school. I will be contacting the head of the Computer Department at Mohawk with regards to this. If anyone from Seneca or perhaps, any other college in Canada, can offer advice, it would be greatly appreciated.

How Do Patents Apply To Me?

I’ve been diligently working on a scene partitioning system which combines an Octree with a Uniform Grid. Basically, the way it works is that you build a loose Octree, which starts 4 levels deep. When each node reaches a critical mass, it subdivides into another level. The max levels you can have (While still being optimal) is 7. So, let’s assume that we have a detailed scene with a 7 level Octree. At the bottom of the Octree, each node is 128th the size of the entire area you are encompassing and it is also uniformly proportional to the entire area. You can build a Uniform Grid out of the bottom most nodes giving you the best of both worlds.

When all this is built, to add and remove items from the grid is a matter of doing simple division of finding out the exact spot in the Uniform Grid where the object belongs. Since pointers are being shared between Octree Nodes and Uniform Grid Nodes, you essentially add an item to the Octree in O(1) time (Adding to a Tree structure usually Takes O(Log n) time). Collision detection with simple objects is O(1) time while with complex objects it is O(Log n) time. What I have done is made the Octree a bit faster in some areas. Good idea, isn’t it?

Here is my problem. While randomly googling on this topic, I found a patent for this idea. The patent is very similar to what I just described. What I want to know (For all you Law junkies out there) is how does this effect me? Can I get sued? Does it matter that my implementation is my own and not copied from the patent? Does it matter that my implementation is Open Source? Are there ways to get around this patent (My implementation is different but algorithmically similar)?

Any suggestions are welcome.

U.S. And Human Rights

I knew that the U.S. had some Human Rights problems itself but it came as a shock to me that U.S. has a long and painful record of Human Rights violations. Even if you are not into politics, it makes for a very enlightening read.

I’ve Been Busy…

I always hated the holidays. Christmas is probably the worst time of the year and what makes it so is all the rushing and running around and pointless stress. Giving and spending time with your family should be the focus of Christmas, not some faceless marketing campaign. On top of all this, I moved back to Hamilton. I should note that if you’re planning on moving in the middle of the winter time, make sure you do it on the warmest day possible. This was our second move on one of the coldest days of the year and it was the least bit pleasant. Regardless, all is well now and I have the next 10 months to recuperate.

I got my hands on a copy of Windows 7, and I am not impressed. It is, hands down, Vista Service Pack 2. Vista never impressed me, and Windows 7 is barely an upgrade from that. I have to admit, some of the features it has run faster, as they should, and are generally more efficient. For example, I run Windows 7 in VMWare. This VMWare image is being hosted by a Vista Machine running on raw hardware. Even though Windows 7 is virtualized, it still runs faster then Vista on raw hardware. Not only that, all the settings are buried so far away from the user that it becomes absurd. Changing network settings takes twice the amount of time as it used to (and this is not including the UAC pop-ups). As a word of advice, if you’re a poweruser and you want to have ease of use, stick with Windows XP, or 2000. If you are a Vista lover, Windows 7 is for you. Basically, ditch Vista by going back to XP/2K, or upgrade to Windows 7 once it matures.

On a side note, on the off chance I will upgrade to Windows 7, I have been looking for alternate shells and some seem promising. I’d rather have a shell that is poweruser friendly and fast. Litestep is one such product. Unfortunately, there is no way to actually replace all the functionality of explorer so there will be some annoyances. Linux is looking much better each time I look at Vista or Windows 7. With the way this post is going, maybe I should rename my blog to Grumble Grumble, Things that piss me off.

I’m currently looking into a thing called “Visual Studio Visualizers.” They are small scripts that allow you to modify the way the debugger displays information. I’m not done my research yet, but expect a nice big article on this. It’s barely documented and really cool :).

Radiant Update

I held a bet with my brother to see who can guess the number of lines of code (or at least, close to it) my rendering engine is comprised of. Neither of us were close but I got a good sense of how much work went into this giant mash of code. About 25,000 lines is the total. That total is comprised of approximately 12,000 lines of computational code and about 13,000 lines of comments. This number does not include blank lines and such things. Regardless, it’s a staggering number for a project that is being worked on by one person, part time.

The engine itself is about 45% complete, with the majority of planning done. One of the components, my math module (it handles math stuffs and collision detection), is finally finished. I can breath a sigh of relief, it’s not easy stuff. Unfortunately I didn’t write any unit tests for it so I don’t know if it actually works or not. The project is open source. So, if anyone is masochistic enough to write a tests for my math module, by all means :).

Primary Export: Pain

Everyone knows (or should know) that when you put together a DLL, you need to export functionality so that programs using your DLL know where to find your functions. This is usually done by prefixing classes or functions with __declspec(dllexport) or manually writing a definition file. Straight forward and right to the point. But what happens when you need to export something that does not have a name. Say for example, an overloaded new operator. What the hell does a new operator look like as a definition symbol?

I’ll give you a hint: its not human readable!

So, before I actually spoil the beans and tell you what I did, I have to explain why I did it, because its rather interesting. Radiant, my game engine, is split into 6 DLLs, all of which touch and create dynamically allocated memory somewhere. The problem with allocating memory all over the place is that you need to delete it in the same address space (or DLL) as you allocated it in. With that said, to make it more complex, I had to overload the new and delete operators in order to wrap around _aligned_malloc() and _aligned_free() calls. This is a special type of allocation that allows you to align dynamic memory to an address that is divisible by 16 (or any other value). This is crucial if your using SSE or any special instruction set because all values need to be aligned to at least 16 bytes.

Anyway, going back to the problem at hand, I have a bunch of overloaded operator functions that cannot be exported because if I try to add the __declspec(dllexport) prefix to them, the compiler will scream and tell you that the declarations of the functions do not match up with what is defined internally. Basically, what I am stuck with are a handful of functions that cannot exported programmatically. This is where the definition file comes into play. Exporting a function or class is as easy as entering its name in the definition file under the heading of EXPORTS. But here is the kicker, the overloaded operators of ‘new’ and ‘delete’ do not have a name! They are declared internally in a header that exists in the compiler’s own static data, and there’s no way to override inclusion of that header. Therefore, the only way is to manually enter the function’s mangled name into the definition file.

The mangled name looks something like the following:

??2@YAPAXI@Z (void * __cdecl operator new(unsigned int))
??3@YAXPAX@Z (void __cdecl operator delete(void *))
??_U@YAPAXI@Z (void * __cdecl operator new[](unsigned int))
??_V@YAXPAX@Z (void __cdecl operator delete[](void *))



In fact, those mangled names are fairly generic and may not match up correctly. But their names are very similar to what they should be. A more specific example of the name would be located in the Visual Studio directory under VC\crt\src\intel\_sampld_.def. This file contains a slew of definitions. What your looking for are the first four definitions that look very similar to the ones posted above. If you are running under a x64 or Itanium architecture, there are definition files for those architectures as well.

After a successful compile, all dynamic memory is allocated and deallocated in a single DLL’s memory space. This prevents the Heap Corruption errors I was getting before and allows me to further enhance the allocation and deallocation of dynamic memory. Woot!

I would suggest reading the following link because it contains a very good description of how this process works. Unfortunately, I stumbled upon this file AFTER I already fixed this problem. Alas, C’est La Vie.

Multicore Processing And Game Engines

I have been passively researching multicore processing for the last few weeks and I came to the conclusion that it is rather easy to implement. In its simplest form, all you need to do is create threads and have them do Jobs. The OS will then schedule a thread to be run on a dedicated core. Having multiple cores makes those threads run at the same time as opposed to the old time-slicing method of single core processors. But, at the very base level, it’s rather primitive and can actually be improved upon.

Creating threads and closing them is fairly fast but may be a bottle neck if the engine does that consistently. The best way to handle this is to not do it, obviously. This is where a Thread Pool comes in handy. It creates a bunch of worker threads that don’t get destroyed until the program exits. Each thread will sit idle until a job has been passed into it to be processed. This involves the use of critical sections and semaphores to accomplish and is much faster then allocating and deallocating threads. A critical section is optimized for speed as compared to any other form of asynchronous data sharing and messaging (alternatives include Mutexes, Events, and so forth). The rule of thumb is to create enough threads so that the OS does not have to time-slice. This is usually done by allocating [num of cores] + 1 threads.

In order for a Thread Pool to work properly, it requires a few things. Firstly, a Job queue. This is a long list of jobs that will get distributed between the threads once threads become available. Secondly, some sort of thread state management. It includes a set of states that the threads can be at. The basic types are ‘Working’ and ‘Idle’, but it can vary on the amount of complexity you add to the Thread Pool. Lastly, it requires data sharing. I suggest writing an object that wraps data around a locking/unlocking mechanism (Semaphores come in handy for this task). Once these aspects are implemented, the Thread Pool is basically finished.

Usage is another key role. Lets assume that you either use the built in Win32 Threading pool or roll your own, it doesn’t matter which one you do. Furthermore, you have some very repetitive code that you want to multithread. If you don’t quite know what is multithreadable, the best place to start looking is in any for loop. The place where I’m going to use my Thread Pool is in a loop where I would have to update some world object, such as a player state or even scene management. For example, this loop might call your world object and cause it to return collision information with its nearest neighbors. Plug the world objects into the Thread Pool, have it run asynchronously and output some data into some shared object. As it’s doing this, have the main thread wait or do some other processing until the output is realized. Once its all done, the Thread Pool will suspend its threads and the main thread can resume doing its job.

What this Thread Pool is designed to do is to complete small tasks asynchronously. Sticking a dedicated piece of code on one thread (such as a sound subsystem, or networking) is rather counter productive to a Thread Pool because it utilizes the thread until the end of the program. I would suggest that a subsystem that is substantially heavy be on a separately spawned thread instead. Another word of advice that I came across is to keep the amount of writing done on each thread to a minimum because it requires locks. The best approach is to have each thread write to its own dedicated memory that is attached to the Job that its processing. Keep the shared data read only when possible.

I’m in the process of implementing my own Thread Pool and once its done, I’ll post some metrics.

Work++; // Again!

I’ve been slacking. No, really, mega slacking. Its been almost two months since I wrote something here. Earlier, I wrote about my latest employment opportunity at PWLabs. Well, that job ended abruptly. There was nothing wrong with that job, I enjoyed it fully, but several days after I started, I got a call from a recruiter that works for McAfee (Yes, the anti virus maker). Impeccable timing! He hooked me up with the dev team at McAfee and we had a phone interview. That, eventually, blossomed to a full 5 hour interview (you heard me correctly, 5 Hours!) where I took two tests on top of an in-person interview. Needless to say, I got the job and my brief career at PWLabs came to a close.

The stuff that they have me doing right now is guarded with an NDA, but its mostly going to be Win32 programming. My experience at BumpTop and at Strike Technologies helped me get my foot in the door. I always avoided huge Corporate jobs but this one actually made me excited.

For those that don’t know, McAfee’s engineering department is located in Waterloo, so the drive from Toronto is a bit time consuming. I ended up buying a 2008 Hyundai Accent for the trip, it’s not a bad car. Nevertheless, I will be moving down to this area eventually. Some places that I was looking at are in Cambridge, Kitchener, and Guelph. But that won’t happen for a few years anyway.