IOS (using Xamarin) Hardware Decoder (H264/H265)

Topics: Question
Jun 1, 2016 at 9:20 PM
Beginning with IOS V8.0 Access to the hardware decoder became available. Utilizing it is quite easy if you can begin and set the decoder up from an "AVAsset" which is derived from a compatible file or url for the media. RTSP is not such one of those animals. That being the case we are left to using some more low level access. Basically to decode video with the hardware decoder we need to do the following. Now i have spent most of my time reading through the discussion board and not so much in reviewing the code. So what I am hoping to get is maybe some guidance, or information on how the various requirements to initialize this decoder and the separation of NALU units is done with RTPFrame Change?

The steps to implement seem quite easy (depending on the data received from the RTPStack)...

Here is what I need to do, so any guidance, input, feedback would be appreciated. I will be jumping into the code tonight.

(https://developer.apple.com/videos/play/wwdc2014/513/)

Generate individual network abstraction layer units (NALUs) from your H.264 elementary stream. VCL NALUs (IDR and non-IDR) contain your video data and are to be fed into the decoder.

Re-package those NALUs according to the "AVCC" format, removing NALU start codes and replacing them with a 4-byte NALU length header.

Create a CMVideoFormatDescriptionRef from your SPS and PPS NALUs via CMVideoFormatDescriptionCreateFromH264ParameterSets()

Package NALU frames as CMSampleBuffers per session 513.

Create a VTDecompressionSessionRef, and feed VTDecompressionSessionDecodeFrame() with the sample buffers

--Alternatively, use AVSampleBufferDisplayLayer, whose -enqueueSampleBuffer: method obviates the need to create your own decoder.
Coordinator
Jun 1, 2016 at 10:02 PM
Edited Jun 1, 2016 at 10:04 PM
I already can generate the NAL in Rtp Format and re-package them from the AVC format, see RFC6184Frame.

I don't remove the start codes yet as they would only be present in a container, thus the BaseMediaReader should implement GetSamples and then the samples should be verified for proper start codes and then passed to RFC6184Frame.Packetize as required.

I can easily create a FormatTypeLine, seee SDPUnitTests, syntax is easy. I will probably eventually add CreateFormatTypeLine to Sdp.MediaDescription to make this task easier and ensure the passes payload type is present in the MediaDescriptionLine.
 public Sdp.Lines.FormatTypeLine CreateFormatTypeLine(int payloadType, string parameters)
        {
            return new Lines.FormatTypeLine(payloadType, parameters);
        }

        public IEnumerable<Sdp.Lines.FormatTypeLine> CreateFormatTypeLines(string parameters)
        {
            foreach (int payloadType in PayloadTypes)
            {
                yield return new Lines.FormatTypeLine(payloadType, parameters);
            }
        }
'VTDecompressionSessionRef' is kind of like accessing Gpu Memory in Windows See

'AVSampleBufferDisplayLayer' is a closer API to MediaCodec on Android, EnqueueBuffer DequeueBuffer etc and that is how I believe Xamarin provides support already under iOS, see Xamarin

I would probably go for a completely C# approach before offering platform specific code and even then I would offer platform specific intrinsic to optimize the processes.

There is CSCodec which has H.264 there is also another repository named the same within the Mono repositories; Dirac and Theora implementations are present there as well as a few audio implementations, those can provide somewhat of a basis for decoding without GPU support which would be better suited for versions of IOS < 8 also.

Other users have successfully used the transport layer this library provides combined with the existing media playback found within Xamarin to bring playback support to IOS and as a result playback itself is already hardware accelerated through such.

If you can't use Xamarin and your using Pure Mono then performance can be addressed in various ways depending on the model of phone and accompanying operating system you wish to support within your application and build of Mono which accompanies it.
Marked as answer by juliusfriedman on 6/1/2016 at 2:02 PM
Jun 1, 2016 at 10:22 PM
For the Decoding of H264 Video (as a raw stream) on IOS, their don't seem to be a lot of options:
  1. FFMPEG (Ugly)
  2. CSCodec (Somewhat buggy but does work somewhat - limited to H264 though)
  3. OpenCV (They want $400) for IOS Bindings
  4. Building a new interface to the built in hardware decoder
  5. Writing an entirely new H264 and H264 decoder.
So doesn't seem like a lot of options here.

I've begun to lean towards building an interface, then allow "chaining". Basically define the Decoder, define the Viewer, and feed the chain.

Underneath, all the above could be implemented and the user could select what works best of course. For my immediate need of course IOS and then Android. Being selfish, but also wishing to contribute, implementing an interface for DecodingEngine, then building the underlying Engines in C# to that interface. (First CSCODEC as it seems to be working somewhat as a test bed in previous users and can be used on multiple platforms), then the Hardware decoder (as in the end I would most likely use that in my IOS App or MAC App), then move on to Android and build decoder/player to the same interfaces.

End result would be decoders and viewers for Android, PC, IOS, MAC for H264 on all, H265 on supported.

If you think of the old DirectShow Pins and such.... SO many options on codecs (video and audio, as well as input variations and output variations) make it not so easy, but have to have something which ultimately allows further thought/building.

Thoughts....
Jun 1, 2016 at 10:25 PM
On a side note, you mention others have done this within IOS - what have they used as a means of doing this? Simply CSCODEC decode and UIViewImage updating? (or OpenGL display)

I know of no other way to display a raw H264 stream.

Shawn
Coordinator
Jun 1, 2016 at 10:31 PM
Edited Jun 1, 2016 at 10:33 PM
I am confused how the implementations would also benefit Android...

I also confused how your are talking about Android and iOS without Xamarin, are you indeed building mono yourself and then planning on distributing it in the application within App Stores? Android already has a few C# shells and there are also a few for iOS.

It depends on what your goals and needs are at the current time and how you intend on accomplishing such.

I have no idea what your trying to do yet...

The end result should be compatible with all operating systems as the library is, if you need platform specific interfaces then there has to be an analogous in the library which requires it.

Perhaps you are talking about making Platform Decoders available in C# in general, if that is the case then you would need to derive from Media.Codec and provide the implementations within for that platform, it seems easy enough and doesn't require any other dependencies allowing you to choose or fallback as required.

See Xamarin's documentation as I linked to find out how they do it exactly, although I am pretty sure they use whatever native interfaces were available in iOS at the time to provide a suitable way to call the MediaCodec implementation within Java. Where there were no interfaces available they fallback to software.
Marked as answer by juliusfriedman on 6/1/2016 at 2:31 PM
Jun 1, 2016 at 10:54 PM
Sure, let me clarify...
  1. When discussing IOS or Android - most definitely speaking of using Xamarin. (Sorry if not explicit)
  2. Whatever methodology/interfaces were used for implementing DecoderEngines for various protocols would be usable on supported platforms. For example CSCODEC as it stands currently it NOT usable on IOS or Android. (Now it could be made to be with some changes to it). Hardware decoding on IOS is different than say on Android decoding and such.
Lets say an interface was decodingEngines were defined, I would then go into the CSCODEC (branch it) modify it to work to conform to the Interface for IOS, PC, MAC. Lets say the decode interface was quite simple, just a few properties, and basically had a BYTEIN/BYTEOUT flow... (Compressed in, Uncompressed out) The decode engine would also need to set the machines, os, capable of. But for now, I'm not trying to delve that deep.

The IOS Hardware would build to this same interface defined above. (Give it a name "IOSABC")

The Android Media Codec, the same.

The interface would require PIN settings. So users can select/set options as needed (property bags). As feeding the data and its format, and retrieving the data and a format should be selective, but will also be limited by the platform you are on. (Only the engine knows what it can and can't do)

Whats driving me to this is that currently, their is not an "easy", or demoable sample of anything outside of a full blown PC or MAC on MONO using system.drawing or ffmpeg to decode a stream and display it. My needs are mobile, and even further down the chain (embedded ARM etc), which also have hardware decoders, but in no way can support a pure software solution. So having the ability to drop in a decoder which requires little change to code if any, is quite the route I would hope to get too.

My need is simple today:
  1. IOS - open an RTSP stream to a SportCam/WebCam
  2. IOS - receive that stream back, decode the RTP
  3. IOS - decode the internal RTP H264 or H265 to get uncompressed
  4. IOS - display 30 FPS in HD (Retina Display)
1 and 2 This library handles

3 and 4 - nothing available for someone like me to pickup and use easily.

Next steps include VR360 immersive video, but thats a whole other discussion. I need to focus on 1-4 right now.

So my hopes were since I would like to use what you have put together so far, and I will be writing 3 and 4 (unless I can find it elsewhere), thought to contribute back in some way shape or form.

Hope that helps
Coordinator
Jun 1, 2016 at 11:30 PM
Edited Jun 1, 2016 at 11:40 PM
The software API may be different but that's a semantic I don't need to worry about. If you have Xamarin support it can already encode and decode and possibly uses hardware acceleration though their build processes and framework derivations.

Your a little ahead of yourself with pin bags and all that, the API is already defined... either use it or make your own hardware interface and call it. Its not going to be easy to recreate DirectShow everywhere nor should you since it's easier to just make a more direct approach in your situation unless your creating a framework or contributing to this one in some meaningful way.

For help with 3 and 4 (the first listed time) you should already have support in Xamarin, if not why? Your trying to mix System.Drawing and the hardware acceleration API calls and I don't understand why, System.Drawing isn't available in either Android or iOS or Silverlight nor is it the preferred Api to use anymore.

Finally many of the many other libraries which are available and possibly incorporated into a Mono runtime or for use eithe with or without Xamarin can be used...

What is your issue exactly? Your second list of 1, 2, 3 and 4 is confusing.

See for an example for how to interface with the GPU in iOS.

Still no sure how that will help the project though....
Marked as answer by juliusfriedman on 6/1/2016 at 3:38 PM
Jun 1, 2016 at 11:47 PM
Unfortunately, for example MediaCodec is ANDROID only in Xamarin, not IOS.

IOS SDK only has limited supported for streaming formats. (eg HLS).

IOS SDK did not have a means to decode H264 independent from say an MP4 or HLS until 8.0, which enabled the Hardware decoder.

Xamarin did not publish any additional API's for decoding and playing as such. Android is a different story, much more available there.

As for 3rd party decoders - FFMPEG, VLC, OpenCV ($$), etc... Nothing simple.

As for accessing OpenGL for image display thats pretty easy. Its step #3...... Remember there is not System.Drawing on IOS either....


So I am left with:
  1. Incorporating FFMPEG, VLC or OpenCV (Not Not Not)
  2. Changing CSCODEC to not use System.Drawing but IOS Graphics Functions
  3. Implementing Direct Hardware Decoding
If you know of something I do not, please enlighten?

As for API already defined? for Decoding? Can you point me to what you speak of, I have the code open and going thru it. (Of Course in the Media.Codecs.H264), in looking at this, what would you see as the process/output to decode an RTPFrame utilizing this Media.Codecs.H264.

I see:
namespace Media.Codec.Interfaces
{
    public interface IDecoder
    {
    }
}
Nothing defined and such.

Decoders I assume need to be registered up to the CODEC somehow, then returned.


Again all is mute if I am missing something that IOS/Xamarin already provides to do this?
Coordinator
Jun 1, 2016 at 11:56 PM
Edited Jun 2, 2016 at 12:12 AM
MediaCodec is an OMX implementation...

LibStageFreight is the library implementation of the OMX interfaces on Android...

Ffmpeg can use LibStageFreight...

On iOS can you also but you have AVFoundation etc...


I linked previously to Xamarin support for AvToolkit and their other classes which work under iOS to access encoders and decoders....

Xamarin didn't publish what? They have everything published what are you looking for?

Where do you see what? That definition ilooks like it is from my source code and it's provided simply to seperate implementation from Codec if desired...

Possibly your requesting that I further define methods on the interface such that the interface can also have Decode and methods but I don't see the correlation yet as there would be an OMXCodec which was able to be used with OMXHardware and that would further provide OMXDecoder and OMXEncoder which would then implement Codec and Decoder as well as Encoder and additionally provide their own requirements within their specific respective interfaces.

I eventually plan on adding support such a paradigm but obviously the undertaking is quite large enough as it is withought first class hardware support.

Nvidia and ATI as well as Intel interfaces can be made to function under the OMX model, now it's just a question of if it should as there is plenty of non OMX hardware already in existence which would need to define interfaces to make such hardware compatible with the OMX interfaces, it's quite easy to do if there is already a working driver; you just need to make a new wrapper which is acceptable in the platform or environment your working in.

In this case Xamarin and Apple have done your work fo you.

And BTW OMX is OpenMax...
Marked as answer by juliusfriedman on 6/1/2016 at 3:56 PM
Jun 2, 2016 at 12:20 AM

IOS / Xamarin

Then I am completely missing something. As no where have a found ANY means of decoding H264 Frames. The only means I have found is using the Xamarin Bindings for the methods I offered above which access the Hardware decoder. AVFoundation allows for playing media of "known types", not any direct H264 decoding or playing from a buffer or stream. Again, not saying I am completely wrong, but I have not found anything else which could be used.

Using FFMPEG - trying to stay away from that completely.

AVFoundation primarily relies upon AvAsset which requires a FILE or a URL to which you can then use say AVAssetReader and such. We don't have that here. Part of AVFoundation is also the entry way into using the Hardware decoder though as well - using VTDecompressionSessionRef.

Please correct my assumptions here if I am wrong, as they dont have a means of decoding raw H264 that I am aware of. *Android is very much different

Your code

As for looking at your code (all of 20 minutes so far), if I were wanting to DECODE H264 received in RTP from your library, what would be the actual class/method I should be looking at. Then i could provide better questions but that this point, im just trying to jump thru it and try to get the lay of structure.

I am looking at Media.Codes.Classes -> Decoder Class, which implements iDecoder
    public class Decoder : Media.Codec.Interfaces.IDecoder
    {
        public Media.Codec.Interfaces.ICodec Codec { get; protected set; }

        //Write to base stream, seek back and call Decode
        //Decode(byte[], int offset, int length)

        //Decode(int offset, int length)
    }
So if I end up needing to implement an H264 Decoder (whether it was IOS Hardware, CSCODEC, or something else) would I simply inherit the Idecoder and write my own methods/properties? Or is there a more defined interface to use, or am I completely off base...
Coordinator
Jun 2, 2016 at 12:43 AM
Edited Jun 2, 2016 at 12:49 AM
Okay, first of all I would like to address the comments / proposals on the interface method declarations your showing. I have nicely provided enhancements to the code to make it not only correct with respect to the library but to show you how I am using the existing interfaces.

Your problem is that you need to give a compliant container file to the Encoder or Decoder on iOS and Windows for that matter.

You need to mux the raw H.264 data into a container file which the Encoder or Decoder understand.

This is pretty easy to do also although I don't have anything in the library yet I have made several examples which show it's very easy to achieve.

Possibly we would be able to work together to provide a BaseMediaWriter which would not only suit your needs in iOS but anywhere else such a requirement is made including Desktop or otherwise.

THIS IS THE RESULT, YOU WILL BE ABLE TO SEE THE PICTURE ASSOCIATED WITH VIDEO AND ALSO HEAR AUDIO

If that is what you need then read on... and lets work together!

The reason I suggest this is because of the surface area which I will expand upon further, we will reap immediate rewards not only in Windows Forms but Windows Presentation Foundation, XBox and just about anywhere else in which BaseMediaFiles are understood (DLNA etc)

The RtspClient could be created and consume a H.264 Video stream and if desired also an accompanying AAC Audio stream (or other codec if really desired) into a complaint and optionally fragmented BaseMediaFile which would then be able to read in a variety of places and not require any further interop or hard work.

Here is your starting improved starting point for what you were doing... I highly advise you take heed to my advise and expertise unless you find it interesting to develop such layers and have the time to invest in research and development of such also.

It would be much easier in my humble opinion and much more rewarding both immediately and in terms of capacity to allow writing to a BaseMediaFile.

With a little more work BaseMediaReader could then also be combined with the BaseMediaWriter for use in the RtspServer to deliver a myriad of media types to end users optionally falling back to a dynamic style Rtp packets which are using mpeg transport in their payloads to encompass any further or undefined codec which requires transport in such form.

Let me know how that sounds!.
namespace MyApplication
{
    public interface IApplicationCodec : Media.Codec.Interfaces.ICodec
    {
        //Already provided via ICodec
        ///////// <summary>
        ///////// Gets the Guid which uniquely identifies the codec.
        ///////// </summary>
        //////Guid Id { get; }

        ///////// <summary>
        ///////// Gets the string which corresponds to the name of the codec.
        ///////// </summary>
        //////string Name { get; }

        ///////// <summary>
        ///////// The types of media supported by the codec.
        ///////// </summary>
        //////MediaType MediaTypes { get; }

        ///////// <summary>
        ///////// Indicates if the Codec can encode data.
        ///////// </summary>
        //////bool CanEncode { get; }

        ///////// <summary>
        ///////// Indicates if the Codec can decode dta.
        ///////// </summary>
        //////bool CanDecode { get; }

        ////////(Try)CreateEncoder maybe better suited with signature (bool, out IEncoder) ...options
        //////IEncoder Encoder { get; }

        ////////(Try)CreateDecoder maybe better suitedwith signature (bool, out IDecoder) ...options
        //////IDecoder Decoder { get; }
    }

    public interface IApplicationDecoder : Media.Codec.Interfaces.IDecoder
    {
        //This may be okay for your application but it's not required in the library although it may be cool to have an Extension method which can do this for everyone via an IDecoderExtensions class....
        //This is because each Codec will have a unique Id which identifies it, Encoders and Decoders which implement it will be able to be looked up from the Id when registered.
        //This is on purpose to allow for a Codec to be loaded, A Decoder to be created and the Codec itself to be Disposed of and unloaded....     
        //public Media.Codec.Interfaces.ICodec Codec { get; protected set; }

        //Do not write to the base stream and then call Decode on that same stream.
        //Decoders should implement a PriorityQueue or similar which provides the ability to add data properly (e.g. with respect to consumption of the data already being decoded)
        

        //---Write to base stream, seek back and call Decode
        //--Decode(byte[], int offset, int length)

        //Do not have a Decode method which does not work in units relevant to the Decoder, e.g. this may be bytes or floats or RGBA or TextureData etc.
        //--Decode(int offset, int length)        

        //This should be the collection which can be peeked at or cleared from the application.
        ICollection<Media.Common.MemorySegment> Data { get; }

        //Something like this may be appropraite for your needs, where the result indicates if anything was decoded.
        bool Decode(int frames); //-1, 0, etc.
    }

    public interface IApplicationEncoder : Media.Codec.Interfaces.IEncoder
    {
        ///....
    }
}
In addition to being easier it will be a more reasonable goal to achieve and have finalized in a reasonable amount of time, I would assume no more than 60 days total including testing to engineer the requirements adequately depending on the surface areas you need; for instance we may shoot for plain BaseMedia compliance at first and then I could add 'rtp' atom support as well as Fragmented support which possibly should even be it's own subclass of BaseMediaWriter.

Let me know what you think or if you would like to Skype related to this.

Sincerely,
Julius
Marked as answer by juliusfriedman on 6/1/2016 at 4:43 PM
Jun 2, 2016 at 12:51 AM
Yes I am familiar muxing MPEG streams, also somewhat familiar with HLS as well...

Let me think some on this, as I could internally "proxy" a muxed stream.....
Jun 2, 2016 at 12:56 AM
The issue with muxing and such would keyframes, i need to put some thought into this.. IOS is pretty strict on the IFrame requirement at specific intervals for HLS.

Boxing into MP4 is problematic because we dont have the index frames to load first.
Coordinator
Jun 2, 2016 at 1:09 AM
Edited Jun 2, 2016 at 1:18 AM
There is no issue with key frames...

The RtpPacket has a marker bit and the RFC6184Frame has a Depacketize method which gives you the raw RBSP.

The RBSP can be encapsulated in a MOOV very easily or MOOF for that matter even easier.

Think small just one frame at first, if you have up to framerate of frames they can also be included.

Optionally add audio.

Play.

When complete rinse wash and repeat optionally allow buffering and play and volume control, further post processing or whatever else with threading.

What's the major malfunction?

Do you want to be able to properly split on I frames..? Don't you think thats the job of the decoder to do from any amount of data buffers..

An easy way would be to do intermediate parsing and buffering before the passing to the decoder... Maybe slice headers should also be verified before being passed to the decoder also...

...Finalize the atom with the correct sizes and offsets and your done, feed to the decoder and your at exact boundaries.

FlourineFX does this for Rtmp and it works just fine for them...
Marked as answer by juliusfriedman on 6/1/2016 at 5:09 PM
Jun 2, 2016 at 1:20 AM
If you mean writing to an Mp4 box, i have code to do that in C#... Did that a couple years back on a DLNA project I did. Issue would be multiple MP4s being played would regenerate player and flash, etc, etc.

IOS can play MPEGTS streams... Its what I am thinking about now... Issue would be timestamping on that, never did figure that part out, but wouldnt be too extreme to dig back into that.

But not sure about 264 in MPEGts....
Coordinator
Jun 2, 2016 at 1:24 AM
Edited Jun 2, 2016 at 1:30 AM
I mean the MOOF or MOOV and all required atoms as well as the samples. It's only the MDAT and one or two others.

The timestamps don't have to match in separate video segments in HLS... if they do then theres an option for that also but it seems weird to me as they are segments...

In mp4 either if the movies are separate.

Once you have movies you can calculate the total duration in the player of all movies before playing to eliminate any flashing or repeat the last frame over / skip frames.

264 in ts is easy just packetize to nals and then tsunit. Most blueray players work this way afaik the library can already read that format and any other compliant ts stream / program stream.

Parsing of the Elementary streams is also started in the code but not completely finished.
Marked as answer by juliusfriedman on 6/1/2016 at 5:24 PM
Coordinator
Jun 9, 2016 at 1:29 PM
The developer (Firlefanz) who was developing for iOS said to reference these threads :

http://stackoverflow.com/questions/29525000/how-to-use-videotoolbox-to-decompress-h-264-video-stream/

http://stackoverflow.com/questions/26035380/how-avsamplebufferdisplaylayer-displays-h-264

Looking at it seems like the answer you needed without the heavy interop as the cited libraries already do the interop for you.

Hope to hear back!
Marked as answer by juliusfriedman on 6/9/2016 at 5:29 AM