Jon Olick
  • Home
  • Presentations
  • Publications
  • Patents
  • Videos
  • Code
  • Games
  • Art
  • Blogspot
  • Twitter
  • WikiCoder
  • Contact
  • Links
  • Home
  • Presentations
  • Publications
  • Patents
  • Videos
  • Code
  • Games
  • Art
  • Blogspot
  • Twitter
  • WikiCoder
  • Contact
  • Links

Realtime AI Image Upscaling (Super Resolution)

2/18/2021

0 Comments

 
The following AI Image Upscaler is made from a single file deep neural network library (jo_nn.h) - and will have the ability to ship to many common user targets (CPU/GLES/OpenGL/Vulkan/DX11/DX12/Metal). The plan is to release jo_nn.h once its more feature complete with regards to the production targets. Currently, CPU/GLES/WebGL/OpenGL/Vulkan is in good shape.

Until the release, an example output of image super resolution...

Initializing...

or



0 Comments

How to Compress an Index Buffer

1/26/2021

0 Comments

 
This is an article going over the technique to compress index buffers presented in my 2007,2008 presentations in more detail here.  

Update: Fixed some bugs in my pseudo code. Also important to point out that your choice of index buffer optimizer greatly affects the effectiveness of this technique. Hence, look for a future post where I talk about that a bit. 

The process is as follows:

First clean up the data...
1) Optimize the index buffers for cache coherency (another article on this later eventually, there's better ways than the typical methods)
2) Remove degenerate triangles (triangles with two or more indexes which are identical)
3) Remove duplicate triangles (triangles which occur more than once in the same index buffer, even if in different orders)
4) Order the vertexes such that they appear in the same order as the index buffer itself. 

The next step we make note that each triangle is typically re-using many elements of the previous triangle - so don't repeat yourself! The most common patterns occur in triangle fans and triangle strips which the index buffer optimizer creates in abundance.  This usually has two repeating indexes, and one new index. There are some other cases with only two new indexes and such, but they are rare enough that we just escape all other cases to 3 new indexes. Additionally, we can losslessly rotate the indexes of the triangle to reduce the set of possibilities we need to encode.

So we in effect create a 2 bit "repeat table", where we have an entry per triangle which describes the most common repeating patterns - and also creating a new index buffer which has those repeats omitted. 

For the 2 bit table, each value would mean
0 ==  first index same as last triangle's first index, second index same as last triangle's third index.
1 = = first index same as last triangle's third index, second index same as last triangle's second index.
2 == first index same as last triangle's second index, second index same as last triangle's first index.
3 == escape to encode all 3 indexes as if not repeated. 

In  code this would be something like:
// need a 2-bit entry for each triangle, except the first one which is assumed to be "3"
char *repeat_table = (char*)malloc(num_indexes/3-1);
char *rptbl = repeat_table; // the 2-bit repeat table, in  bytes. entropy coder will do the right thing later.
int *unq_idxs = unique_index_buffer;
int *idxs = index_buffer;
int * idxs_end = index_buffer + num_indexes;
// First triangle is always there.
*unq_idxs++ = *idxs++;
*unq_idxs++ = *idxs++;
*unq_idxs++ = *idxs++;
for(; idxs < idxs_end; idxs+=3) {
  int *prev_tri = idxs-3;
  bool is_idx_old[3] = {false};
  int num_old_idxs = 0;
  for(int j = 0; j < 3; ++j) {
    if(idxs[j] == prev_tri[0] || idxs[j] == prev_tri[1]  || idxs[j] == prev_tris[2]) { 
      is_idx_old[j] = true;
      ++num_old_idxs;
    }
  }
  // Previous cleanups should guarantee that degenerate triangles do not happen.
  assert(num_old_idxs != 3);
  // If we have less than 2 index repeats, then just encode an escape into the repeat table.
  if(num_old_idxs < 2) {
    memcpy(unq_idxs, idxs, sizeof(*idxs)*3);
    unq_idxs += 3;
    *rptbl++ = 3;
  }
  // Rotate the triangle to match one of the encoding if possible.
  // We want the two index repeats in the first two indexes of this triangle.
  while(!is_idx_old[0] || !is_idx_old[1]) {
    // Rotate the triangle
    int tmp = idxs[0];
    memmove(idxs, idxs+1, sizeof(*idxs)*2);
    int btmp = is_idx_old[0];
    memmove(is_idx_old,  is_idx_old+1, sizeof(*is_idx_old)*2);
  }
   if(idxs[0] == prev_tri[0] && idxs[1] == prev_tri[2]) {
    *unq_idxs++ = idxs[2];
    *rptbl++ = 0;
    continue;
   }
   if(idxs[0] == prev_tri[2] && idxs[1] == prev_tri[1]) {
    *unq_idxs++ = idxs[2];
    *rptbl++ = 1;
    continue;
   }
   if(idxs[0] == prev_tri[1] && idxs[1] == prev_tri[0]) {
    *unq_idxs++ = idxs[2];
    *rptbl++ = 2;
    continue;
  }
  memcpy(unq_idxs, idxs, sizeof(*idxs)*3);
  unq_idxs += 3;

  *rptbl++ = 3;
}

The next step is to notice that the indexes which are left are typically sequential (0,1,2,3,4,5,6,7,8,etc) as we re-ordered the vertex buffer to specifically be that way up-front. We can encode that explicitly with a 1-bit table which represents that...
0 == not sequential
1 == sequential

code for that would look something like...

int num_unique_indexes = unq_idxs - unique_index_buffer;
char *chaos_index_buffer = (char*)malloc(num_unique_indexes);  // maximum possible size
char *chaos_idxs = chaos_index_buffer;
char *chaos_bits = (char*)malloc(num_unique_indexes); 
for(int i = 0, j = 0; i < num_unqiue_indexes; ++i) {
  if(unqiue_index_buffer[i] == j) {
    chaos_bits[i] = 1;
    ++j;
  } else {
    chaos_bits[i] = 0;
    *chaos_idxs++ = unique_index_buffer[i];
  }
}

Next up in the compression is to notice that the difference in numbers is typically small and can be represented in a small number of bits by delta encoding. Word of warning though, that this was built in the time of SPUs where you had to chunk your data up - as such sometimes depending on your data, that the deltas encoded in the following may be not worth the effort. I would bet though that there would be some on average gain though for most uses.

int num_chaos_indexes = chaos_idxs - chaos_index_buffer;
// Delta-encode the indexes 
for(int i = num_chaos_indexes-1;  i >= 1; --i) {
  chaos_index_buffer[i] -= chaos_index_buffer[i-1];
}

Since the values can go positive or negative, you next need to shift the data to all positive.

// First find the min value
int min_idx = INT_MAX;
for(int i = 0; i < num_chaos_indexes; ++i) {
   min_idx = min_idx < chaos_index_buffer[i] ? min_idx : chaos_index_buffer[i];
}

// Then shift it up so its 0 based.
for(int i = 0; i < num_chaos_indexes; ++i) {
  chaos_index_buffer[i] -= min_idx;

}

Finally, you need to encode the following data...
1) min_idx
2) chaos_bits
3) chaos_index_buffer
4) repeat_table

and you have a few choices for doing that.
1) You could just throw it at Kraken/Zip and let it handle all the heavy lifting.
2) You could just pack the data straight forward like.

min_idx is 4 bytes (or 2 bytes if this is known 16-bit index buffer)
repeat_table  is 2-bits per entry
chaos_bits is 1 bit per entry
chaos_index_buffer is some number of bits, typically less than a full 16bits or 32bits per entry - so you can just find the minimum number of bits it can fit in and encode that directly.

Obviously, Kraken/Zip is probably going to get you the biggest wins, but for a higher overall computational cost vs bit unpacking.

When Kraken encoding, its probably beneficial to split the high and low bytes so that the compression doesn't have to do any heaving lifting there (high bytes should be almost always zero).

For now I leave the decoding of the data as an exercise for the reader - its pretty straight forward just doing the operations in reverse. 

0 Comments

Kodak Image set 2x super-resolution'd

12/11/2020

0 Comments

 
The kodak set is great, but I needed some higher resolution versions of them for some testing that hopefully was more representative regarding compression than a straight up bilinear interpolation - so I ran it through my super-resolution NN and got some 2x outputs. I figure this would probably be useful to others as well, so I'm posting the data set here for you! 
kodak.zip
File Size: 84953 kb
File Type: zip
Download File

0 Comments

Oodle Texture BC6H Exposé

6/18/2020

1 Comment

 
​Oodle Texture is a new technology we've developed at RAD Game Tools which promises to dramatically shrink game sizes, reducing what you need to download and store on disk, and speeding up load times even more.

Oodle Texture is specialized for what are called "block compressed textures". These are a form of compressed image data that is used by GPUs to provide the rendering attributes for surfaces in games. Oodle Texture works on BC1-BC7 textures, sometimes called "BCN textures". The BC1-7 are seven slightly different GPU formats for different bit depths and content types, and most games use a mix of different BCN formats for their textures.

There are normal non-RDO encoders which are very good maximum quality encoders, along with a RDO (Rate-Distortion-Optimization) which can allow your textures to compress further with an additional compressor such as Oodle Kraken or Zip while still maintaining extremely high quality. 

In this post, I want to primarily cover the BC6H quality of our non-RDO maximum quality encoders compared to a commonly used alternative.

First though, what is BC6H? BC6H is a 3-channel (RGB) half-float texture format. It turns out that BC6H is the same size as BC7, even though BC7 compresses only 8-bit data while BC6H compresses 16-bit floating point data. The magic that makes this possible is in the details of how it encodes the texture. 

There are two formats to BC6H, a signed format and an unsigned format. This does matter as the "half-floats" are encoded differently for each. In the unsigned format, the half-float has a precision of 5 bits in the exponent and 11 bits in the mantissa where as the signed format as 1 bit specifying positive or negative, 5 bits of exponent and only 10 bits of mantissa. Thus if your data is always >= 0, you should probably use the unsigned format as you will get better quality out of it. In the typical use cases of BC6H that I am aware of, the data is typically >= 0.

Like all other BCn formats, each texture is broken up into 4x4 blocks and each block for BC6H is encoded in such a way where there are multiple possible encoding modes per block. The encoding modes, of which there are 14 different possible encoding modes, primarily specify the dynamic range (that is the minimum and maximum value of all pixels in a block) and the precision of the block in different possible ways. 

While some of the modes can cover the entire possible range of a 16-bit half float (at reduced quantized encoding precision), most of them are delta encodings, where you have a base color in the dynamic range and the rest of the colors are offsets from that base color.

The colors themselves which are used specify non-linear lines through the color space for each channel. Its non-linear because its specifying them in the integer values of a half float and these integer values are interpolated directly. IE when you interpolate the integer value of a half-float, you get a non-linear distribution of colors along that line. (I hope that's clear... it is kind of confusing).

It gets even more complicated and for more information on the specifics see https://docs.microsoft.com/en-us/windows/win32/direct3d11/bc6h-format

Sufficed to say, encoding these things optimally is highly non-trivial. The search space is enormous, and even the choice of how you measure what is good or not is also fairly ill-defined for HDR textures. The reason this is is because if you just use straight up Squared-Error it will cause errors in bright spots to over-whelm any of the surrounding data prioritizing getting those just right, while your visual system in your eye is essentially logarithmic in intensity response -- meaning the brighter the values the less you see the small differences -- thus Squared-Error really messes up the colors on the edges of bright objects as it thinks those bright errors are just as important as the darker errors (which is not the case). Your choice in measuring error in BC6H is thus very important. We spent a lot of time nailing that down, and it really shows in the quality of results.

This is my favorite example showing off the quality of Oodle Texture. 
Picture
Source Data (Ground Truth)
Picture
Common BC6H alternative encoder with max quality setting
Picture
Oodle Texture non-RDO maximum quality encoding
Additionally, you can do what is called Rate-Distortion Optimization (RDO) which will make smarter encoding choices for a very large gain in compressibility of the data. More on that in a future post.

Charles has a really nice write-up of our RDO encoders here: https://cbloomrants.blogspot.com/2020/06/oodle-texture-slashes-game-sizes.html

(Seriously, go read that then come back)

The original maximum quality DDS texture there can only be compressed by 2%!

Here's the compression ratio table made from various lambda RDO values...
non-RDO
RDO lambda=10
RDO lambda=30
RDO lambda=40
1.02::1
1.35::1
1.64::1
1.71::1
Picture
Oodle Texture non-RDO
Picture
Oodle Texture RDO lambda=10
Picture
Oodle Texture RDO lambda=30
Picture
Oodle Texture RDO lambda=40
While those look identical, I assure you there are very subtle differences - but those mostly imperceptible differences make all the difference between no compression and 1.71:1 compression. 

You can read more about Oodle Texture at the RAD Game Tools web site, along with the rest of the Oodle family of data compression solutions.
1 Comment

Oodle and UE4 Loading Time

5/14/2020

0 Comments

 
Picture
I spent some time recently determining the effect of Oodle on UE4 Load Time for various theoretical disk speeds. 

The Oodle compressors can speed up load time in two different ways : one because they decompress faster, taking less CPU time for decompression, second by making the data smaller, which saves IO time. When disk speeds are slow, smaller files that save IO time are the primary benefit. When disk speeds are very fast, using less CPU time for decompression is the main factor.

First, I patched the UE4 source to limit the cores to less than the system to something more reasonable. I then patched the source to artificially limit the disk IO speed to something specific. The data itself was loaded from a PCIE-4 SSD - so very very fast and needed to be artificially limited to reflect the typical performance of say a blu-ray or PS4/XB1 HDD.

Of note, I did not emulate seek time, so the seek time is assumed to be basically instant - so YMMV. Also, real world load times will also be affected by things like disk cache, so we get more useful measurements by simulating disk speed.

Loading in UE4 is the sum of the time taken to load from disk and the time to decompress that data plus overhead time for level loading that's not directly in IO or decompression. Though depending on how many cores are available this loading from disk and decompressing the data itself can sometimes be done in parallel - for the purposes of these tests that was minimized through core affinity settings and mutexes. 

What are we comparing? 
ZLib and Oodle. If you enable compression for pak files in Unreal, software zlib is used by default. Oodle provides a plugin that drops in and changes the pak file compression. Mostly we care about Oodle's Kraken encoder as it has very desirable perf for compression ratio, but I included the others (Selkie, Mermaid, Leviathan, Hydra) as well in my testing.
​
The time we are measuring here is three things.
1) We want to know time to first frame.
2) We want to know how much time total was spent decoding. 
3) We want to know how much time total was spent loading from disk. 

#1 is the most important overall score, but #2 and #3 inform us about how much we can gain from the different options of Oodle Compressors and which one we should use specifically. 

How fast is the PS4/XB1 HDD?
About 65-80 MB/s typical.

How fast is a Blue-ray?
About 10-20 MB/s (though seek times are horrendous)

How did I measure time to first frame?
With RAD Telemetry of course! :) (Seriously invaluable tool if you aren't familiar)

How much data are we loading to get to first frame?
ZLib: ~105 MB
Kraken: ~86 MB

Kraken has less data to load because of higher compression ratio.

First up, just Zlib and Oodle time to first frame...
Read MB/s
Zlib Time to First Frame
Kraken Time to First Frame
Selkie Time to First Frame
Mermaid Time to First Frame
Leviathan Time to First Frame
16 MB/s (Blu-ray est)
10 seconds
7 seconds
8.5 seconds
7 seconds
7 seconds
64 MB/s (PS4/XB1 HDD est)
7 seconds
3.75 seconds
4 seconds
4 seconds
4.75 seconds
512 MB/s (fast PC)
7 seconds
3.75 seconds
3.5 seconds
3.5 seconds
4.5 seconds
The time it takes to do just the decompression part (not counting disk speed - just decompression time) is also pretty interesting. 
ZLib: 3.88 seconds
Kraken: 1.39 seconds

The other Oodle formats here are as follows with regards to decompression time...
Selkie: 0.24 seconds
Mermaid: 0.64 seconds
Leviathan: 1.82 seconds
Hydra: 1 second

^^ you heard that right. Even Leviathan, Oodle's LZMA like compression ratio is over twice as fast as Zlib here... 

In isolation Leviathan can decode 3X faster than Zlib, here we're timing not in an ideal benchmark, but in the actual usage in Unreal, where sometimes the buffers compressed are small and the overhead means we don't reach the full speeds Oodle is capable of.
​
The disk io time is (when measured) basically equivalent to the time to first frame - the decompression time + a second or two depending on how many cores you have working. 

In conclusion, Oodle does make a meaningful impact on load times - This is extremely so for lower end devices which have fewer cores and also on systems with HDDs which are typical for PC & Current Gen console games. Presumably the Nintendo Switch will also benefit greatly from Oodle as well since the game data is loaded on a sdcard and those come in various speeds (sometimes really really slow). 

For more information on Oodle visit http://www.radgametools.com/oodle.htm

Read More
0 Comments

It Came From Github #1

4/15/2020

0 Comments

 
Trying out a new series of blog posts where I talk about different things that I found on github that I think others might also find interesting/useful. This being the first post in the series. This may also be the last post in the series... who knows! 

First up, ray-tracers built in a bunch of different languages - profiled and compared. 
 https://github.com/athas/raytracers 
I found this one somewhat interesting cause it had language choices which I was not familiar with.  Though it missed some other obvious (to me) language choices of course - like straight up C and/or C++. Note that I am in no way saying this guy who wrote this code does so in every language well for speed or clarity - so don't consider this an endorsement or anything ;). Perhaps not surprisingly, of the languages they chose to implement Rust came out on top. Rust being a systems language made for performance similar to C, this was kind of expected. Still, its interesting to check out Haskell and a few other uncommon language choices in there. I have a weird fondness for Ocaml, in that I like to look at it from afar, but have never actually used it in a real project yet (I doubt I will) and I thought it was an odd choice to put in this comparison - but maybe not! The implementation in Ocaml looks rather simple, but it usually kinda does which is why I like the language. 

Second, there is a database here of Covid-19 chest xrays. 
https://github.com/ieee8023/covid-chestxray-dataset
This could perhaps be used by some DNNs to train for detecting the disease - so perhaps useful for anybody who is interested in using machine learning to help with this disease.

Third, if you spend a lot of time in linux - this breakdown of the command line has some pretty neat things in here - some of which I knew and some of which I did not.
https://github.com/jlevy/the-art-of-command-line

Fourth, a paper repository. If you are looking for something specific, or just want to learn something new, this might be a good place to start! 
https://github.com/papers-we-love/papers-we-love

Fifth, power toys from Microsoft. In this repo is a bunch of handy utilities to make your development life just a little bit easier. From right-click image resizing, to batch renaming, to new file types supported in the explorer preview pane and more... 
https://github.com/microsoft/PowerToys

That's all for now! Stay safe and Enjoy! 
0 Comments

jo_jpeg Release 1.60

11/27/2019

0 Comments

 
Its been a long time coming, but I finally got around to implementing sub-sampling U,V in the JPEG writer. This means many files are 20-30% smaller than before with very little visual quality loss. Sub-sampled UV is enabled automatically for quality levels <= 90. The new code functions exactly like it did before with the same API as before. Drop it in and enjoy!
jo_jpeg.cpp
File Size: 19 kb
File Type: cpp
Download File

0 Comments

Oodle Lossless Image v1.4.7

4/12/2019

0 Comments

 
Oodle Lossless Image (OLI) version 1.4.7 was just released. In this release there is lots of improvements - specifically with palettized and 1/2 component images. Also in 1.4.7 is a basic Unity engine integration! 

OLI now supports palettized images -- up to 2048 unique colors (though it could go as high as 64k, but I didn't see a benefit in my test set to go higher than 2048). Implementing this was pretty interesting in that the order of those colors in the palette matter quite a bit - and the reason is that if you get it right, then it works with the prediction filters. As in, if the palettized color indexes are linearly predictable then there is a good chance you will get significantly better compression than just a random ordering. In practice this means trying a bunch of different heuristics (since computing optimal brute force like is prohibitively expensive). So you sort by luma, or by different channels, or by distance from the last color for example (picking the most common color as the first one). I also implemented mZeng palette ordering technique which isn't commonly in PNG compressors. Believe it or not, while this theoretically should produce really good results in most cases, sometimes the simpler heuristics win by a lot so you can't just always use a single method to decide when going for minimum file sizes. 

Examples (some images I've seen used as examples on other sites):
Picture
Source PNG: 36447
pngcrush -brute: 39018
WebP: 25042
​FLIF: ​24179
OLI --insane: 18909
OLI --super-duper: 18813

Picture
Source PNG: 21603
pngcrush -brute: 21609
WebP: 18978
​FLIF: ​16114
OLI --insane: 14709
OLI --super-duper: 14881


In all cases, the following arguments were used
pngcrush -brute <input> <output>
cwebp -q 100 -lossless -exact -m 6 -mt <input> -o <output>
flif -E100 -K <input> <output>

Note that insane sometimes doing slightly worse than super-duper happens sometimes due to layered processes - just on average insane is going to be better.

1/2 Component images were just a matter of writing all the various SIMD routines to decode them. Other than that nothing special here except having fewer components means smaller files and faster decoding. I may in the future support more than 4 components if there is a demand for that, but for now its 1,2,3 or 4 components of 8 or 16-bits per component. 

There also were some general small encoding improvements. And soon coming up are some new color spaces which should further reduce file size. Specifically the new encoding flag called "--insane" which actually compresses stuff instead of using heuristics in most places to find whats the best thing to do. I use this for dev, but it might be useful for people looking to squeeze out a few more percent in file sizes. 

For more information on Oodle Lossless Image visit 
http://www.radgametools.com/oodlelimage.htm
​​
0 Comments

Building for OSX,iOS,tvOS & watchOS on Windows

8/1/2018

0 Comments

 
Note: This is a work-in-progress and still being tested for possible distribution issues. I will update this blog post as the work progresses. 

Trying to simplify my life a bit over here, I am on a journey to eliminate my Mac from the build iteration cycle. The goal is to completely ship all binaries for both Bink and Oodle Lossless Image (OLI) directly from my PC rather than occasionally building on a mac only to find that Apple broke yet another thing in the latest OSX update or iSDK release (seriously, stop that!). 

First thing first, your gonna need a toolchain. I used the toolchain from http://www.pmbaty.com/iosbuildenv/ which is claimed to be a native port of the apple tools opensource.apple.com/tarballs/.   

I also used MSys (via http://mingw.org/) over here so I could have my same build scripts that work on OSX work nearly transparently on Win as well with very little modification.
​
To build for OSX, iOS, tvOS and watchOS you are going to need some sysroots from a real mac. 

You can find these and some frameworks you are going to need in each SDK release at the following paths
  • /Applications/Xcode.app/Contents/Developer/Platforms/AppleTVOS.platform/Developer/SDKs/AppleTVOS{version}.sdk
  • /Applications/Xcode.app/Contents/Developer/Platforms/AppleTVSimulator.platform/Developer/SDKs/AppleTVSimulator{version}.sdk
  • /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS{version}.sdk
  • /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator{version}.sdk
  • /Applications/Xcode.app/Contents/Developer/Platforms/WatchOS.platform/Developer/SDKs/WatchOS{version}.sdk
  • /Applications/Xcode.app/Contents/Developer/Platforms/WatchSimulator.platform/Developer/SDKs/WatchSimulator{version}.sdk
  • /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX{version}.sdk

Next use clang to build for Apple by specifying some additional parameters.

The first of which is your target specification.
  • When building for iOS use "--target=arm-apple-ios10"
  • When building for tvOS use "--target=arm-apple-tvos10"
  • When building for watchOS use "--target=arm-apple-watchos10"
  • When building for iPhoneSimulator use "--target=x86_64-apple-ios10"  
  • When building for AppleTVSimulator use "--target=x86_64-apple-tvos10"  
  • When building for WatchSimulator use "--target=x86_64-apple-watchos10"  
  • When building for 32-bit OSX use "--target=x86-apple-darwin-macho"
  • When building for 64-bit OSX use "--target=x86_64-apple-darwin-macho"

Second, specify your framework directory. This is located in your {SDK}/System/Library/Frameworks directory, so would be specified as "-F{SDK}/System/Library/Frameworks"

Third, you need to specify your sysroot as "--sysroot {SDK}". The sysroot tells the compiler where your headers and libs are. 

That's about it for building stuff (I think?). Just use as normal.

To make a DMG file you need to do things a bit differently since there is no hdiutil on windows as it is closed-source apple tech. 

Instead of hdiutil, you use mkisofs (you can get that with mingw, or provided also right here...
mkisofs.exe
File Size: 344 kb
File Type: exe
Download File

invocation would look something like

mkisofs -J -R -o {file}.dmg -mac-name -V "{title}" -apple -v -dir-mode 777 -file-mode 777 {dmg_directory}

As for signing executables, I haven't yet had to worry about that... hoping I won't! I would point you to the pmbaty ios tools which has an executable signer in there. 

If I missed anything, or something is not clear or not working for you, please let me know in the comments below and I'll help if I can!
0 Comments

DagNN vs Standard 2-Layer fully connected networks

4/11/2017

0 Comments

 
A quick post about the results for my first comparison here of a 2-layer fully connected network vs a DagNN. 

I've removed most of the random variables here for this example so that the comparison is pretty accurate. The only random variable left is the order in which things are trained due to SGD - however, as I removed more and more random variables the differences got more in favor of DagNN and not less.

The conclusion of this test is that DagNN is better node-for-node per epoch than the standard 2-layer fully connected network - at least in this example.
Picture
64 * 2-layer fully connected network. Solution error at epoch 300 of 685.
Picture
128 node DagNN fully connected network. Solution error at epoch 300 of 380.
This at least follows intuition a bit, that more weights between the same number of nodes increases overall computational power of the network.

More rigorous comparisons in some of the standard test cases needs to be done, but this is a good first step offering some preliminary credibility.
0 Comments
<<Previous

    Archives

    February 2021
    January 2021
    December 2020
    June 2020
    May 2020
    April 2020
    November 2019
    April 2019
    August 2018
    April 2017
    March 2017
    January 2017
    November 2016
    October 2016
    September 2016
    January 2016
    March 2015
    August 2013
    July 2013
    December 2012

    Categories

    All
    Compression
    Dxt

    RSS Feed