update libchdr
[pcsx_rearmed.git] / deps / libchdr / include / dr_libs / dr_flac.h
CommitLineData
2ff0b512 1/*
2FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
9e052883 3dr_flac - v0.12.39 - 2022-09-17
2ff0b512 4
5David Reid - mackron@gmail.com
6
7GitHub: https://github.com/mackron/dr_libs
8*/
9
10/*
11RELEASE NOTES - v0.12.0
12=======================
13Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
14
15
16Improved Client-Defined Memory Allocation
17-----------------------------------------
18The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
19existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
20allocation callbacks are specified.
21
22To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
23
24 void* my_malloc(size_t sz, void* pUserData)
25 {
26 return malloc(sz);
27 }
28 void* my_realloc(void* p, size_t sz, void* pUserData)
29 {
30 return realloc(p, sz);
31 }
32 void my_free(void* p, void* pUserData)
33 {
34 free(p);
35 }
36
37 ...
38
39 drflac_allocation_callbacks allocationCallbacks;
40 allocationCallbacks.pUserData = &myData;
41 allocationCallbacks.onMalloc = my_malloc;
42 allocationCallbacks.onRealloc = my_realloc;
43 allocationCallbacks.onFree = my_free;
44 drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
45
46The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
47
48Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
49DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
50
51Every API that opens a drflac object now takes this extra parameter. These include the following:
52
53 drflac_open()
54 drflac_open_relaxed()
55 drflac_open_with_metadata()
56 drflac_open_with_metadata_relaxed()
57 drflac_open_file()
58 drflac_open_file_with_metadata()
59 drflac_open_memory()
60 drflac_open_memory_with_metadata()
61 drflac_open_and_read_pcm_frames_s32()
62 drflac_open_and_read_pcm_frames_s16()
63 drflac_open_and_read_pcm_frames_f32()
64 drflac_open_file_and_read_pcm_frames_s32()
65 drflac_open_file_and_read_pcm_frames_s16()
66 drflac_open_file_and_read_pcm_frames_f32()
67 drflac_open_memory_and_read_pcm_frames_s32()
68 drflac_open_memory_and_read_pcm_frames_s16()
69 drflac_open_memory_and_read_pcm_frames_f32()
70
71
72
73Optimizations
74-------------
75Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
76improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
77advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
78means it will be disabled when DR_FLAC_NO_CRC is used.
79
80The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
81particular. 16-bit streams should also see some improvement.
82
83drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
84to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
85
86A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
87channel reconstruction which is the last part of the decoding process.
88
89The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
90compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
91compile time and the REV instruction requires ARM architecture version 6.
92
93An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
94
95
96Removed APIs
97------------
98The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
99
100 drflac_read_s32() -> drflac_read_pcm_frames_s32()
101 drflac_read_s16() -> drflac_read_pcm_frames_s16()
102 drflac_read_f32() -> drflac_read_pcm_frames_f32()
103 drflac_seek_to_sample() -> drflac_seek_to_pcm_frame()
104 drflac_open_and_decode_s32() -> drflac_open_and_read_pcm_frames_s32()
105 drflac_open_and_decode_s16() -> drflac_open_and_read_pcm_frames_s16()
106 drflac_open_and_decode_f32() -> drflac_open_and_read_pcm_frames_f32()
107 drflac_open_and_decode_file_s32() -> drflac_open_file_and_read_pcm_frames_s32()
108 drflac_open_and_decode_file_s16() -> drflac_open_file_and_read_pcm_frames_s16()
109 drflac_open_and_decode_file_f32() -> drflac_open_file_and_read_pcm_frames_f32()
110 drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
111 drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
112 drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
113
114Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
115to the old per-sample APIs. You now need to use the "pcm_frame" versions.
116*/
117
118
119/*
120Introduction
121============
122dr_flac is a single file library. To use it, do something like the following in one .c file.
123
124 ```c
125 #define DR_FLAC_IMPLEMENTATION
126 #include "dr_flac.h"
127 ```
128
129You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
130
131 ```c
132 drflac* pFlac = drflac_open_file("MySong.flac", NULL);
133 if (pFlac == NULL) {
134 // Failed to open FLAC file
135 }
136
137 drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
138 drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
139 ```
140
141The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
142should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
143a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
144
145You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
146samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
147
148 ```c
149 while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
150 do_something();
151 }
152 ```
153
154You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
155
156If you just want to quickly decode an entire FLAC file in one go you can do something like this:
157
158 ```c
159 unsigned int channels;
160 unsigned int sampleRate;
161 drflac_uint64 totalPCMFrameCount;
162 drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
163 if (pSampleData == NULL) {
164 // Failed to open and decode FLAC file.
165 }
166
167 ...
168
169 drflac_free(pSampleData, NULL);
170 ```
171
172You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
173should be considered lossy.
174
175
176If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
177The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
178reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
179
180The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
181streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
182
183 `drflac_open_relaxed()`
184 `drflac_open_with_metadata_relaxed()`
185
186It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
187APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
188
189
190
191Build Options
192=============
193#define these options before including this file.
194
195#define DR_FLAC_NO_STDIO
196 Disable `drflac_open_file()` and family.
197
198#define DR_FLAC_NO_OGG
199 Disables support for Ogg/FLAC streams.
200
201#define DR_FLAC_BUFFER_SIZE <number>
202 Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
203 Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
204 you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
205
206#define DR_FLAC_NO_CRC
207 Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
208 be used if available. Otherwise the seek will be performed using brute force.
209
210#define DR_FLAC_NO_SIMD
211 Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
212
9e052883 213#define DR_FLAC_NO_WCHAR
214 Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined.
215
2ff0b512 216
217
218Notes
219=====
220- dr_flac does not support changing the sample rate nor channel count mid stream.
221- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
222- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
223 to differences in corrupted stream recorvery logic between the two APIs.
224*/
225
226#ifndef dr_flac_h
227#define dr_flac_h
228
229#ifdef __cplusplus
230extern "C" {
231#endif
232
233#define DRFLAC_STRINGIFY(x) #x
234#define DRFLAC_XSTRINGIFY(x) DRFLAC_STRINGIFY(x)
235
236#define DRFLAC_VERSION_MAJOR 0
237#define DRFLAC_VERSION_MINOR 12
9e052883 238#define DRFLAC_VERSION_REVISION 39
2ff0b512 239#define DRFLAC_VERSION_STRING DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
240
241#include <stddef.h> /* For size_t. */
242
243/* Sized types. */
244typedef signed char drflac_int8;
245typedef unsigned char drflac_uint8;
246typedef signed short drflac_int16;
247typedef unsigned short drflac_uint16;
248typedef signed int drflac_int32;
249typedef unsigned int drflac_uint32;
9e052883 250#if defined(_MSC_VER) && !defined(__clang__)
2ff0b512 251 typedef signed __int64 drflac_int64;
252 typedef unsigned __int64 drflac_uint64;
253#else
254 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
255 #pragma GCC diagnostic push
256 #pragma GCC diagnostic ignored "-Wlong-long"
257 #if defined(__clang__)
258 #pragma GCC diagnostic ignored "-Wc++11-long-long"
259 #endif
260 #endif
261 typedef signed long long drflac_int64;
262 typedef unsigned long long drflac_uint64;
263 #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
264 #pragma GCC diagnostic pop
265 #endif
266#endif
9e052883 267#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
2ff0b512 268 typedef drflac_uint64 drflac_uintptr;
269#else
270 typedef drflac_uint32 drflac_uintptr;
271#endif
272typedef drflac_uint8 drflac_bool8;
273typedef drflac_uint32 drflac_bool32;
274#define DRFLAC_TRUE 1
275#define DRFLAC_FALSE 0
276
277#if !defined(DRFLAC_API)
278 #if defined(DRFLAC_DLL)
279 #if defined(_WIN32)
280 #define DRFLAC_DLL_IMPORT __declspec(dllimport)
281 #define DRFLAC_DLL_EXPORT __declspec(dllexport)
282 #define DRFLAC_DLL_PRIVATE static
283 #else
284 #if defined(__GNUC__) && __GNUC__ >= 4
285 #define DRFLAC_DLL_IMPORT __attribute__((visibility("default")))
286 #define DRFLAC_DLL_EXPORT __attribute__((visibility("default")))
287 #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
288 #else
289 #define DRFLAC_DLL_IMPORT
290 #define DRFLAC_DLL_EXPORT
291 #define DRFLAC_DLL_PRIVATE static
292 #endif
293 #endif
294
295 #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
296 #define DRFLAC_API DRFLAC_DLL_EXPORT
297 #else
298 #define DRFLAC_API DRFLAC_DLL_IMPORT
299 #endif
300 #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
301 #else
302 #define DRFLAC_API extern
303 #define DRFLAC_PRIVATE static
304 #endif
305#endif
306
307#if defined(_MSC_VER) && _MSC_VER >= 1700 /* Visual Studio 2012 */
308 #define DRFLAC_DEPRECATED __declspec(deprecated)
309#elif (defined(__GNUC__) && __GNUC__ >= 4) /* GCC 4 */
310 #define DRFLAC_DEPRECATED __attribute__((deprecated))
311#elif defined(__has_feature) /* Clang */
312 #if __has_feature(attribute_deprecated)
313 #define DRFLAC_DEPRECATED __attribute__((deprecated))
314 #else
315 #define DRFLAC_DEPRECATED
316 #endif
317#else
318 #define DRFLAC_DEPRECATED
319#endif
320
321DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
322DRFLAC_API const char* drflac_version_string(void);
323
324/*
325As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
326but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
327*/
328#ifndef DR_FLAC_BUFFER_SIZE
329#define DR_FLAC_BUFFER_SIZE 4096
330#endif
331
332/* Check if we can enable 64-bit optimizations. */
333#if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
334#define DRFLAC_64BIT
335#endif
336
337#ifdef DRFLAC_64BIT
338typedef drflac_uint64 drflac_cache_t;
339#else
340typedef drflac_uint32 drflac_cache_t;
341#endif
342
343/* The various metadata block types. */
344#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO 0
345#define DRFLAC_METADATA_BLOCK_TYPE_PADDING 1
346#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION 2
347#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE 3
348#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT 4
349#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET 5
350#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE 6
351#define DRFLAC_METADATA_BLOCK_TYPE_INVALID 127
352
353/* The various picture types specified in the PICTURE block. */
354#define DRFLAC_PICTURE_TYPE_OTHER 0
355#define DRFLAC_PICTURE_TYPE_FILE_ICON 1
356#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON 2
357#define DRFLAC_PICTURE_TYPE_COVER_FRONT 3
358#define DRFLAC_PICTURE_TYPE_COVER_BACK 4
359#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE 5
360#define DRFLAC_PICTURE_TYPE_MEDIA 6
361#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST 7
362#define DRFLAC_PICTURE_TYPE_ARTIST 8
363#define DRFLAC_PICTURE_TYPE_CONDUCTOR 9
364#define DRFLAC_PICTURE_TYPE_BAND 10
365#define DRFLAC_PICTURE_TYPE_COMPOSER 11
366#define DRFLAC_PICTURE_TYPE_LYRICIST 12
367#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION 13
368#define DRFLAC_PICTURE_TYPE_DURING_RECORDING 14
369#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE 15
370#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE 16
371#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH 17
372#define DRFLAC_PICTURE_TYPE_ILLUSTRATION 18
373#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE 19
374#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE 20
375
376typedef enum
377{
378 drflac_container_native,
379 drflac_container_ogg,
380 drflac_container_unknown
381} drflac_container;
382
383typedef enum
384{
385 drflac_seek_origin_start,
386 drflac_seek_origin_current
387} drflac_seek_origin;
388
9e052883 389/* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */
2ff0b512 390typedef struct
391{
392 drflac_uint64 firstPCMFrame;
393 drflac_uint64 flacFrameOffset; /* The offset from the first byte of the header of the first frame. */
394 drflac_uint16 pcmFrameCount;
395} drflac_seekpoint;
2ff0b512 396
397typedef struct
398{
399 drflac_uint16 minBlockSizeInPCMFrames;
400 drflac_uint16 maxBlockSizeInPCMFrames;
401 drflac_uint32 minFrameSizeInPCMFrames;
402 drflac_uint32 maxFrameSizeInPCMFrames;
403 drflac_uint32 sampleRate;
404 drflac_uint8 channels;
405 drflac_uint8 bitsPerSample;
406 drflac_uint64 totalPCMFrameCount;
407 drflac_uint8 md5[16];
408} drflac_streaminfo;
409
410typedef struct
411{
412 /*
413 The metadata type. Use this to know how to interpret the data below. Will be set to one of the
414 DRFLAC_METADATA_BLOCK_TYPE_* tokens.
415 */
416 drflac_uint32 type;
417
418 /*
419 A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
420 not modify the contents of this buffer. Use the structures below for more meaningful and structured
421 information about the metadata. It's possible for this to be null.
422 */
423 const void* pRawData;
424
425 /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
426 drflac_uint32 rawDataSize;
427
428 union
429 {
430 drflac_streaminfo streaminfo;
431
432 struct
433 {
434 int unused;
435 } padding;
436
437 struct
438 {
439 drflac_uint32 id;
440 const void* pData;
441 drflac_uint32 dataSize;
442 } application;
443
444 struct
445 {
446 drflac_uint32 seekpointCount;
447 const drflac_seekpoint* pSeekpoints;
448 } seektable;
449
450 struct
451 {
452 drflac_uint32 vendorLength;
453 const char* vendor;
454 drflac_uint32 commentCount;
455 const void* pComments;
456 } vorbis_comment;
457
458 struct
459 {
460 char catalog[128];
461 drflac_uint64 leadInSampleCount;
462 drflac_bool32 isCD;
463 drflac_uint8 trackCount;
464 const void* pTrackData;
465 } cuesheet;
466
467 struct
468 {
469 drflac_uint32 type;
470 drflac_uint32 mimeLength;
471 const char* mime;
472 drflac_uint32 descriptionLength;
473 const char* description;
474 drflac_uint32 width;
475 drflac_uint32 height;
476 drflac_uint32 colorDepth;
477 drflac_uint32 indexColorCount;
478 drflac_uint32 pictureDataSize;
479 const drflac_uint8* pPictureData;
480 } picture;
481 } data;
482} drflac_metadata;
483
484
485/*
486Callback for when data needs to be read from the client.
487
488
489Parameters
490----------
491pUserData (in)
492 The user data that was passed to drflac_open() and family.
493
494pBufferOut (out)
495 The output buffer.
496
497bytesToRead (in)
498 The number of bytes to read.
499
500
501Return Value
502------------
503The number of bytes actually read.
504
505
506Remarks
507-------
508A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
509you have reached the end of the stream.
510*/
511typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
512
513/*
514Callback for when data needs to be seeked.
515
516
517Parameters
518----------
519pUserData (in)
520 The user data that was passed to drflac_open() and family.
521
522offset (in)
523 The number of bytes to move, relative to the origin. Will never be negative.
524
525origin (in)
526 The origin of the seek - the current position or the start of the stream.
527
528
529Return Value
530------------
531Whether or not the seek was successful.
532
533
534Remarks
535-------
536The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
537either drflac_seek_origin_start or drflac_seek_origin_current.
538
539When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
540and handled by returning DRFLAC_FALSE.
541*/
542typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
543
544/*
545Callback for when a metadata block is read.
546
547
548Parameters
549----------
550pUserData (in)
551 The user data that was passed to drflac_open() and family.
552
553pMetadata (in)
554 A pointer to a structure containing the data of the metadata block.
555
556
557Remarks
558-------
559Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
560will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
561*/
562typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
563
564
565typedef struct
566{
567 void* pUserData;
568 void* (* onMalloc)(size_t sz, void* pUserData);
569 void* (* onRealloc)(void* p, size_t sz, void* pUserData);
570 void (* onFree)(void* p, void* pUserData);
571} drflac_allocation_callbacks;
572
573/* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
574typedef struct
575{
576 const drflac_uint8* data;
577 size_t dataSize;
578 size_t currentReadPos;
579} drflac__memory_stream;
580
581/* Structure for internal use. Used for bit streaming. */
582typedef struct
583{
584 /* The function to call when more data needs to be read. */
585 drflac_read_proc onRead;
586
587 /* The function to call when the current read position needs to be moved. */
588 drflac_seek_proc onSeek;
589
590 /* The user data to pass around to onRead and onSeek. */
591 void* pUserData;
592
593
594 /*
595 The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
596 stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
597 or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
598 */
599 size_t unalignedByteCount;
600
601 /* The content of the unaligned bytes. */
602 drflac_cache_t unalignedCache;
603
604 /* The index of the next valid cache line in the "L2" cache. */
605 drflac_uint32 nextL2Line;
606
607 /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
608 drflac_uint32 consumedBits;
609
610 /*
611 The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
612 Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
613 */
614 drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
615 drflac_cache_t cache;
616
617 /*
618 CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
619 is reset to 0 at the beginning of each frame.
620 */
621 drflac_uint16 crc16;
622 drflac_cache_t crc16Cache; /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
623 drflac_uint32 crc16CacheIgnoredBytes; /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
624} drflac_bs;
625
626typedef struct
627{
628 /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
629 drflac_uint8 subframeType;
630
631 /* The number of wasted bits per sample as specified by the sub-frame header. */
632 drflac_uint8 wastedBitsPerSample;
633
634 /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
635 drflac_uint8 lpcOrder;
636
637 /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
638 drflac_int32* pSamplesS32;
639} drflac_subframe;
640
641typedef struct
642{
643 /*
644 If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
645 always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
646 */
647 drflac_uint64 pcmFrameNumber;
648
649 /*
650 If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
651 is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
652 */
653 drflac_uint32 flacFrameNumber;
654
655 /* The sample rate of this frame. */
656 drflac_uint32 sampleRate;
657
658 /* The number of PCM frames in each sub-frame within this frame. */
659 drflac_uint16 blockSizeInPCMFrames;
660
661 /*
662 The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
663 will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
664 */
665 drflac_uint8 channelAssignment;
666
667 /* The number of bits per sample within this frame. */
668 drflac_uint8 bitsPerSample;
669
670 /* The frame's CRC. */
671 drflac_uint8 crc8;
672} drflac_frame_header;
673
674typedef struct
675{
676 /* The header. */
677 drflac_frame_header header;
678
679 /*
680 The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
681 this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
682 */
683 drflac_uint32 pcmFramesRemaining;
684
685 /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
686 drflac_subframe subframes[8];
687} drflac_frame;
688
689typedef struct
690{
691 /* The function to call when a metadata block is read. */
692 drflac_meta_proc onMeta;
693
694 /* The user data posted to the metadata callback function. */
695 void* pUserDataMD;
696
697 /* Memory allocation callbacks. */
698 drflac_allocation_callbacks allocationCallbacks;
699
700
701 /* The sample rate. Will be set to something like 44100. */
702 drflac_uint32 sampleRate;
703
704 /*
705 The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
706 value specified in the STREAMINFO block.
707 */
708 drflac_uint8 channels;
709
710 /* The bits per sample. Will be set to something like 16, 24, etc. */
711 drflac_uint8 bitsPerSample;
712
713 /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
714 drflac_uint16 maxBlockSizeInPCMFrames;
715
716 /*
717 The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
718 the total PCM frame count is unknown. Likely the case with streams like internet radio.
719 */
720 drflac_uint64 totalPCMFrameCount;
721
722
723 /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
724 drflac_container container;
725
726 /* The number of seekpoints in the seektable. */
727 drflac_uint32 seekpointCount;
728
729
730 /* Information about the frame the decoder is currently sitting on. */
731 drflac_frame currentFLACFrame;
732
733
734 /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
735 drflac_uint64 currentPCMFrame;
736
737 /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
738 drflac_uint64 firstFLACFramePosInBytes;
739
740
741 /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
742 drflac__memory_stream memoryStream;
743
744
745 /* A pointer to the decoded sample data. This is an offset of pExtraData. */
746 drflac_int32* pDecodedSamples;
747
748 /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
749 drflac_seekpoint* pSeekpoints;
750
751 /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
752 void* _oggbs;
753
754 /* Internal use only. Used for profiling and testing different seeking modes. */
755 drflac_bool32 _noSeekTableSeek : 1;
756 drflac_bool32 _noBinarySearchSeek : 1;
757 drflac_bool32 _noBruteForceSeek : 1;
758
759 /* The bit streamer. The raw FLAC data is fed through this object. */
760 drflac_bs bs;
761
762 /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
763 drflac_uint8 pExtraData[1];
764} drflac;
765
766
767/*
768Opens a FLAC decoder.
769
770
771Parameters
772----------
773onRead (in)
774 The function to call when data needs to be read from the client.
775
776onSeek (in)
777 The function to call when the read position of the client data needs to move.
778
779pUserData (in, optional)
780 A pointer to application defined data that will be passed to onRead and onSeek.
781
782pAllocationCallbacks (in, optional)
783 A pointer to application defined callbacks for managing memory allocations.
784
785
786Return Value
787------------
788Returns a pointer to an object representing the decoder.
789
790
791Remarks
792-------
793Close the decoder with `drflac_close()`.
794
795`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
796
797This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
798without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
799
800This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
801from a block of memory respectively.
802
803The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
804
805Use `drflac_open_with_metadata()` if you need access to metadata.
806
807
808Seek Also
809---------
810drflac_open_file()
811drflac_open_memory()
812drflac_open_with_metadata()
813drflac_close()
814*/
815DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
816
817/*
818Opens a FLAC stream with relaxed validation of the header block.
819
820
821Parameters
822----------
823onRead (in)
824 The function to call when data needs to be read from the client.
825
826onSeek (in)
827 The function to call when the read position of the client data needs to move.
828
829container (in)
830 Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
831
832pUserData (in, optional)
833 A pointer to application defined data that will be passed to onRead and onSeek.
834
835pAllocationCallbacks (in, optional)
836 A pointer to application defined callbacks for managing memory allocations.
837
838
839Return Value
840------------
841A pointer to an object representing the decoder.
842
843
844Remarks
845-------
846The same as drflac_open(), except attempts to open the stream even when a header block is not present.
847
848Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
849as that is for internal use only.
850
851Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
852force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
853
854Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
855*/
856DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
857
858/*
859Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
860
861
862Parameters
863----------
864onRead (in)
865 The function to call when data needs to be read from the client.
866
867onSeek (in)
868 The function to call when the read position of the client data needs to move.
869
870onMeta (in)
871 The function to call for every metadata block.
872
873pUserData (in, optional)
874 A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
875
876pAllocationCallbacks (in, optional)
877 A pointer to application defined callbacks for managing memory allocations.
878
879
880Return Value
881------------
882A pointer to an object representing the decoder.
883
884
885Remarks
886-------
887Close the decoder with `drflac_close()`.
888
889`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
890
891This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
892metadata block except for STREAMINFO and PADDING blocks.
893
894The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
895pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
896the different metadata types.
897
898The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
899
900Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
901the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
902metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
903returned depending on whether or not the stream is being opened with metadata.
904
905
906Seek Also
907---------
908drflac_open_file_with_metadata()
909drflac_open_memory_with_metadata()
910drflac_open()
911drflac_close()
912*/
913DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
914
915/*
916The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
917
918See Also
919--------
920drflac_open_with_metadata()
921drflac_open_relaxed()
922*/
923DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
924
925/*
926Closes the given FLAC decoder.
927
928
929Parameters
930----------
931pFlac (in)
932 The decoder to close.
933
934
935Remarks
936-------
937This will destroy the decoder object.
938
939
940See Also
941--------
942drflac_open()
943drflac_open_with_metadata()
944drflac_open_file()
945drflac_open_file_w()
946drflac_open_file_with_metadata()
947drflac_open_file_with_metadata_w()
948drflac_open_memory()
949drflac_open_memory_with_metadata()
950*/
951DRFLAC_API void drflac_close(drflac* pFlac);
952
953
954/*
955Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
956
957
958Parameters
959----------
960pFlac (in)
961 The decoder.
962
963framesToRead (in)
964 The number of PCM frames to read.
965
966pBufferOut (out, optional)
967 A pointer to the buffer that will receive the decoded samples.
968
969
970Return Value
971------------
972Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
973
974
975Remarks
976-------
977pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
978*/
979DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
980
981
982/*
983Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
984
985
986Parameters
987----------
988pFlac (in)
989 The decoder.
990
991framesToRead (in)
992 The number of PCM frames to read.
993
994pBufferOut (out, optional)
995 A pointer to the buffer that will receive the decoded samples.
996
997
998Return Value
999------------
1000Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1001
1002
1003Remarks
1004-------
1005pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1006
1007Note that this is lossy for streams where the bits per sample is larger than 16.
1008*/
1009DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
1010
1011/*
1012Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
1013
1014
1015Parameters
1016----------
1017pFlac (in)
1018 The decoder.
1019
1020framesToRead (in)
1021 The number of PCM frames to read.
1022
1023pBufferOut (out, optional)
1024 A pointer to the buffer that will receive the decoded samples.
1025
1026
1027Return Value
1028------------
1029Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
1030
1031
1032Remarks
1033-------
1034pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
1035
1036Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
1037*/
1038DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
1039
1040/*
1041Seeks to the PCM frame at the given index.
1042
1043
1044Parameters
1045----------
1046pFlac (in)
1047 The decoder.
1048
1049pcmFrameIndex (in)
1050 The index of the PCM frame to seek to. See notes below.
1051
1052
1053Return Value
1054-------------
1055`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
1056*/
1057DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
1058
9e052883 1059
1060
1061#ifndef DR_FLAC_NO_STDIO
1062/*
1063Opens a FLAC decoder from the file at the given path.
1064
1065
1066Parameters
1067----------
1068pFileName (in)
1069 The path of the file to open, either absolute or relative to the current directory.
1070
1071pAllocationCallbacks (in, optional)
1072 A pointer to application defined callbacks for managing memory allocations.
1073
1074
1075Return Value
1076------------
1077A pointer to an object representing the decoder.
1078
1079
1080Remarks
1081-------
1082Close the decoder with drflac_close().
1083
1084
1085Remarks
1086-------
1087This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
1088at any given time, so keep this mind if you have many decoders open at the same time.
1089
1090
1091See Also
1092--------
1093drflac_open_file_with_metadata()
1094drflac_open()
1095drflac_close()
1096*/
1097DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1098DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
1099
1100/*
1101Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
1102
1103
1104Parameters
1105----------
1106pFileName (in)
1107 The path of the file to open, either absolute or relative to the current directory.
1108
1109pAllocationCallbacks (in, optional)
1110 A pointer to application defined callbacks for managing memory allocations.
1111
1112onMeta (in)
1113 The callback to fire for each metadata block.
1114
1115pUserData (in)
1116 A pointer to the user data to pass to the metadata callback.
1117
1118pAllocationCallbacks (in)
1119 A pointer to application defined callbacks for managing memory allocations.
1120
1121
1122Remarks
1123-------
1124Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1125
1126
1127See Also
1128--------
1129drflac_open_with_metadata()
1130drflac_open()
1131drflac_close()
1132*/
1133DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1134DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1135#endif
1136
2ff0b512 1137/*
1138Opens a FLAC decoder from a pre-allocated block of memory
1139
1140
1141Parameters
1142----------
1143pData (in)
1144 A pointer to the raw encoded FLAC data.
1145
1146dataSize (in)
1147 The size in bytes of `data`.
1148
1149pAllocationCallbacks (in)
1150 A pointer to application defined callbacks for managing memory allocations.
1151
1152
1153Return Value
1154------------
1155A pointer to an object representing the decoder.
1156
1157
1158Remarks
1159-------
1160This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
1161
1162
1163See Also
1164--------
1165drflac_open()
1166drflac_close()
1167*/
1168DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
1169
1170/*
1171Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
1172
1173
1174Parameters
1175----------
1176pData (in)
1177 A pointer to the raw encoded FLAC data.
1178
1179dataSize (in)
1180 The size in bytes of `data`.
1181
1182onMeta (in)
1183 The callback to fire for each metadata block.
1184
1185pUserData (in)
1186 A pointer to the user data to pass to the metadata callback.
1187
1188pAllocationCallbacks (in)
1189 A pointer to application defined callbacks for managing memory allocations.
1190
1191
1192Remarks
1193-------
1194Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
1195
1196
1197See Also
1198-------
1199drflac_open_with_metadata()
1200drflac_open()
1201drflac_close()
1202*/
1203DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
1204
1205
1206
1207/* High Level APIs */
1208
1209/*
1210Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
1211pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
1212
1213You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
1214case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
1215
1216Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
1217read samples into a dynamically sized buffer on the heap until no samples are left.
1218
1219Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
1220*/
1221DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1222
1223/* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1224DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1225
1226/* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1227DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1228
9e052883 1229#ifndef DR_FLAC_NO_STDIO
1230/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
1231DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1232
1233/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1234DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1235
1236/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1237DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1238#endif
1239
2ff0b512 1240/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
1241DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1242
1243/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
1244DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1245
1246/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
1247DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
1248
1249/*
1250Frees memory that was allocated internally by dr_flac.
1251
1252Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
1253*/
1254DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
1255
1256
1257/* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
1258typedef struct
1259{
1260 drflac_uint32 countRemaining;
1261 const char* pRunningData;
1262} drflac_vorbis_comment_iterator;
1263
1264/*
1265Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
1266metadata block.
1267*/
1268DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
1269
1270/*
1271Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
1272returned string is NOT null terminated.
1273*/
1274DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
1275
1276
1277/* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
1278typedef struct
1279{
1280 drflac_uint32 countRemaining;
1281 const char* pRunningData;
1282} drflac_cuesheet_track_iterator;
1283
9e052883 1284/* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */
2ff0b512 1285typedef struct
1286{
1287 drflac_uint64 offset;
1288 drflac_uint8 index;
1289 drflac_uint8 reserved[3];
1290} drflac_cuesheet_track_index;
2ff0b512 1291
1292typedef struct
1293{
1294 drflac_uint64 offset;
1295 drflac_uint8 trackNumber;
1296 char ISRC[12];
1297 drflac_bool8 isAudio;
1298 drflac_bool8 preEmphasis;
1299 drflac_uint8 indexCount;
1300 const drflac_cuesheet_track_index* pIndexPoints;
1301} drflac_cuesheet_track;
1302
1303/*
1304Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
1305block.
1306*/
1307DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
1308
1309/* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
1310DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
1311
1312
1313#ifdef __cplusplus
1314}
1315#endif
1316#endif /* dr_flac_h */
1317
1318
1319/************************************************************************************************************************************************************
1320 ************************************************************************************************************************************************************
1321
1322 IMPLEMENTATION
1323
1324 ************************************************************************************************************************************************************
1325 ************************************************************************************************************************************************************/
1326#if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
1327#ifndef dr_flac_c
1328#define dr_flac_c
1329
1330/* Disable some annoying warnings. */
1331#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
1332 #pragma GCC diagnostic push
1333 #if __GNUC__ >= 7
1334 #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
1335 #endif
1336#endif
1337
1338#ifdef __linux__
1339 #ifndef _BSD_SOURCE
1340 #define _BSD_SOURCE
1341 #endif
1342 #ifndef _DEFAULT_SOURCE
1343 #define _DEFAULT_SOURCE
1344 #endif
1345 #ifndef __USE_BSD
1346 #define __USE_BSD
1347 #endif
1348 #include <endian.h>
1349#endif
1350
1351#include <stdlib.h>
1352#include <string.h>
1353
1354#ifdef _MSC_VER
1355 #define DRFLAC_INLINE __forceinline
1356#elif defined(__GNUC__)
1357 /*
1358 I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
1359 the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
1360 case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
1361 command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
1362 I am using "__inline__" only when we're compiling in strict ANSI mode.
1363 */
1364 #if defined(__STRICT_ANSI__)
9e052883 1365 #define DRFLAC_GNUC_INLINE_HINT __inline__
1366 #else
1367 #define DRFLAC_GNUC_INLINE_HINT inline
1368 #endif
1369
1370 #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
1371 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline))
2ff0b512 1372 #else
9e052883 1373 #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT
2ff0b512 1374 #endif
1375#elif defined(__WATCOMC__)
1376 #define DRFLAC_INLINE __inline
1377#else
1378 #define DRFLAC_INLINE
1379#endif
1380
1381/* CPU architecture. */
1382#if defined(__x86_64__) || defined(_M_X64)
1383 #define DRFLAC_X64
1384#elif defined(__i386) || defined(_M_IX86)
1385 #define DRFLAC_X86
9e052883 1386#elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64)
2ff0b512 1387 #define DRFLAC_ARM
1388#endif
1389
1390/*
1391Intrinsics Support
1392
1393There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
1394
1395 "error: shift must be an immediate"
1396
1397Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
1398*/
1399#if !defined(DR_FLAC_NO_SIMD)
1400 #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
1401 #if defined(_MSC_VER) && !defined(__clang__)
1402 /* MSVC. */
1403 #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2) /* 2005 */
1404 #define DRFLAC_SUPPORT_SSE2
1405 #endif
1406 #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41) /* 2010 */
1407 #define DRFLAC_SUPPORT_SSE41
1408 #endif
1409 #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
1410 /* Assume GNUC-style. */
1411 #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
1412 #define DRFLAC_SUPPORT_SSE2
1413 #endif
1414 #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
1415 #define DRFLAC_SUPPORT_SSE41
1416 #endif
1417 #endif
1418
1419 /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
1420 #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
1421 #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
1422 #define DRFLAC_SUPPORT_SSE2
1423 #endif
1424 #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
1425 #define DRFLAC_SUPPORT_SSE41
1426 #endif
1427 #endif
1428
1429 #if defined(DRFLAC_SUPPORT_SSE41)
1430 #include <smmintrin.h>
1431 #elif defined(DRFLAC_SUPPORT_SSE2)
1432 #include <emmintrin.h>
1433 #endif
1434 #endif
1435
1436 #if defined(DRFLAC_ARM)
1437 #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1438 #define DRFLAC_SUPPORT_NEON
2ff0b512 1439 #include <arm_neon.h>
1440 #endif
1441 #endif
1442#endif
1443
1444/* Compile-time CPU feature support. */
1445#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
1446 #if defined(_MSC_VER) && !defined(__clang__)
1447 #if _MSC_VER >= 1400
1448 #include <intrin.h>
1449 static void drflac__cpuid(int info[4], int fid)
1450 {
1451 __cpuid(info, fid);
1452 }
1453 #else
1454 #define DRFLAC_NO_CPUID
1455 #endif
1456 #else
1457 #if defined(__GNUC__) || defined(__clang__)
1458 static void drflac__cpuid(int info[4], int fid)
1459 {
1460 /*
1461 It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
1462 specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
1463 supporting different assembly dialects.
1464
1465 What's basically happening is that we're saving and restoring the ebx register manually.
1466 */
1467 #if defined(DRFLAC_X86) && defined(__PIC__)
1468 __asm__ __volatile__ (
1469 "xchg{l} {%%}ebx, %k1;"
1470 "cpuid;"
1471 "xchg{l} {%%}ebx, %k1;"
1472 : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1473 );
1474 #else
1475 __asm__ __volatile__ (
1476 "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
1477 );
1478 #endif
1479 }
1480 #else
1481 #define DRFLAC_NO_CPUID
1482 #endif
1483 #endif
1484#else
1485 #define DRFLAC_NO_CPUID
1486#endif
1487
1488static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
1489{
1490#if defined(DRFLAC_SUPPORT_SSE2)
1491 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
1492 #if defined(DRFLAC_X64)
1493 return DRFLAC_TRUE; /* 64-bit targets always support SSE2. */
1494 #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
1495 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
1496 #else
1497 #if defined(DRFLAC_NO_CPUID)
1498 return DRFLAC_FALSE;
1499 #else
1500 int info[4];
1501 drflac__cpuid(info, 1);
1502 return (info[3] & (1 << 26)) != 0;
1503 #endif
1504 #endif
1505 #else
1506 return DRFLAC_FALSE; /* SSE2 is only supported on x86 and x64 architectures. */
1507 #endif
1508#else
1509 return DRFLAC_FALSE; /* No compiler support. */
1510#endif
1511}
1512
1513static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
1514{
1515#if defined(DRFLAC_SUPPORT_SSE41)
1516 #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
9e052883 1517 #if defined(__SSE4_1__) || defined(__AVX__)
2ff0b512 1518 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
1519 #else
1520 #if defined(DRFLAC_NO_CPUID)
1521 return DRFLAC_FALSE;
1522 #else
1523 int info[4];
1524 drflac__cpuid(info, 1);
1525 return (info[2] & (1 << 19)) != 0;
1526 #endif
1527 #endif
1528 #else
1529 return DRFLAC_FALSE; /* SSE41 is only supported on x86 and x64 architectures. */
1530 #endif
1531#else
1532 return DRFLAC_FALSE; /* No compiler support. */
1533#endif
1534}
1535
1536
1537#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
1538 #define DRFLAC_HAS_LZCNT_INTRINSIC
1539#elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
1540 #define DRFLAC_HAS_LZCNT_INTRINSIC
1541#elif defined(__clang__)
1542 #if defined(__has_builtin)
1543 #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
1544 #define DRFLAC_HAS_LZCNT_INTRINSIC
1545 #endif
1546 #endif
1547#endif
1548
1549#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
1550 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1551 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1552 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1553#elif defined(__clang__)
1554 #if defined(__has_builtin)
1555 #if __has_builtin(__builtin_bswap16)
1556 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1557 #endif
1558 #if __has_builtin(__builtin_bswap32)
1559 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1560 #endif
1561 #if __has_builtin(__builtin_bswap64)
1562 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1563 #endif
1564 #endif
1565#elif defined(__GNUC__)
1566 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
1567 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1568 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1569 #endif
1570 #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
1571 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1572 #endif
9e052883 1573#elif defined(__WATCOMC__) && defined(__386__)
1574 #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
1575 #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
1576 #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
1577 extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
1578 extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
1579 extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
1580#pragma aux _watcom_bswap16 = \
1581 "xchg al, ah" \
1582 parm [ax] \
1583 value [ax] \
1584 modify nomemory;
1585#pragma aux _watcom_bswap32 = \
1586 "bswap eax" \
1587 parm [eax] \
1588 value [eax] \
1589 modify nomemory;
1590#pragma aux _watcom_bswap64 = \
1591 "bswap eax" \
1592 "bswap edx" \
1593 "xchg eax,edx" \
1594 parm [eax edx] \
1595 value [eax edx] \
1596 modify nomemory;
2ff0b512 1597#endif
1598
1599
1600/* Standard library stuff. */
1601#ifndef DRFLAC_ASSERT
1602#include <assert.h>
1603#define DRFLAC_ASSERT(expression) assert(expression)
1604#endif
1605#ifndef DRFLAC_MALLOC
1606#define DRFLAC_MALLOC(sz) malloc((sz))
1607#endif
1608#ifndef DRFLAC_REALLOC
1609#define DRFLAC_REALLOC(p, sz) realloc((p), (sz))
1610#endif
1611#ifndef DRFLAC_FREE
1612#define DRFLAC_FREE(p) free((p))
1613#endif
1614#ifndef DRFLAC_COPY_MEMORY
1615#define DRFLAC_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
1616#endif
1617#ifndef DRFLAC_ZERO_MEMORY
1618#define DRFLAC_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
1619#endif
1620#ifndef DRFLAC_ZERO_OBJECT
1621#define DRFLAC_ZERO_OBJECT(p) DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
1622#endif
1623
1624#define DRFLAC_MAX_SIMD_VECTOR_SIZE 64 /* 64 for AVX-512 in the future. */
1625
1626typedef drflac_int32 drflac_result;
1627#define DRFLAC_SUCCESS 0
1628#define DRFLAC_ERROR -1 /* A generic error. */
1629#define DRFLAC_INVALID_ARGS -2
1630#define DRFLAC_INVALID_OPERATION -3
1631#define DRFLAC_OUT_OF_MEMORY -4
1632#define DRFLAC_OUT_OF_RANGE -5
1633#define DRFLAC_ACCESS_DENIED -6
1634#define DRFLAC_DOES_NOT_EXIST -7
1635#define DRFLAC_ALREADY_EXISTS -8
1636#define DRFLAC_TOO_MANY_OPEN_FILES -9
1637#define DRFLAC_INVALID_FILE -10
1638#define DRFLAC_TOO_BIG -11
1639#define DRFLAC_PATH_TOO_LONG -12
1640#define DRFLAC_NAME_TOO_LONG -13
1641#define DRFLAC_NOT_DIRECTORY -14
1642#define DRFLAC_IS_DIRECTORY -15
1643#define DRFLAC_DIRECTORY_NOT_EMPTY -16
1644#define DRFLAC_END_OF_FILE -17
1645#define DRFLAC_NO_SPACE -18
1646#define DRFLAC_BUSY -19
1647#define DRFLAC_IO_ERROR -20
1648#define DRFLAC_INTERRUPT -21
1649#define DRFLAC_UNAVAILABLE -22
1650#define DRFLAC_ALREADY_IN_USE -23
1651#define DRFLAC_BAD_ADDRESS -24
1652#define DRFLAC_BAD_SEEK -25
1653#define DRFLAC_BAD_PIPE -26
1654#define DRFLAC_DEADLOCK -27
1655#define DRFLAC_TOO_MANY_LINKS -28
1656#define DRFLAC_NOT_IMPLEMENTED -29
1657#define DRFLAC_NO_MESSAGE -30
1658#define DRFLAC_BAD_MESSAGE -31
1659#define DRFLAC_NO_DATA_AVAILABLE -32
1660#define DRFLAC_INVALID_DATA -33
1661#define DRFLAC_TIMEOUT -34
1662#define DRFLAC_NO_NETWORK -35
1663#define DRFLAC_NOT_UNIQUE -36
1664#define DRFLAC_NOT_SOCKET -37
1665#define DRFLAC_NO_ADDRESS -38
1666#define DRFLAC_BAD_PROTOCOL -39
1667#define DRFLAC_PROTOCOL_UNAVAILABLE -40
1668#define DRFLAC_PROTOCOL_NOT_SUPPORTED -41
1669#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED -42
1670#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED -43
1671#define DRFLAC_SOCKET_NOT_SUPPORTED -44
1672#define DRFLAC_CONNECTION_RESET -45
1673#define DRFLAC_ALREADY_CONNECTED -46
1674#define DRFLAC_NOT_CONNECTED -47
1675#define DRFLAC_CONNECTION_REFUSED -48
1676#define DRFLAC_NO_HOST -49
1677#define DRFLAC_IN_PROGRESS -50
1678#define DRFLAC_CANCELLED -51
1679#define DRFLAC_MEMORY_ALREADY_MAPPED -52
1680#define DRFLAC_AT_END -53
1681#define DRFLAC_CRC_MISMATCH -128
1682
1683#define DRFLAC_SUBFRAME_CONSTANT 0
1684#define DRFLAC_SUBFRAME_VERBATIM 1
1685#define DRFLAC_SUBFRAME_FIXED 8
1686#define DRFLAC_SUBFRAME_LPC 32
1687#define DRFLAC_SUBFRAME_RESERVED 255
1688
1689#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE 0
1690#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
1691
1692#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT 0
1693#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE 8
1694#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE 9
1695#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE 10
1696
9e052883 1697#define DRFLAC_SEEKPOINT_SIZE_IN_BYTES 18
1698#define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES 36
1699#define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES 12
1700
2ff0b512 1701#define drflac_align(x, a) ((((x) + (a) - 1) / (a)) * (a))
1702
1703
1704DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
1705{
1706 if (pMajor) {
1707 *pMajor = DRFLAC_VERSION_MAJOR;
1708 }
1709
1710 if (pMinor) {
1711 *pMinor = DRFLAC_VERSION_MINOR;
1712 }
1713
1714 if (pRevision) {
1715 *pRevision = DRFLAC_VERSION_REVISION;
1716 }
1717}
1718
1719DRFLAC_API const char* drflac_version_string(void)
1720{
1721 return DRFLAC_VERSION_STRING;
1722}
1723
1724
1725/* CPU caps. */
1726#if defined(__has_feature)
1727 #if __has_feature(thread_sanitizer)
1728 #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
1729 #else
1730 #define DRFLAC_NO_THREAD_SANITIZE
1731 #endif
1732#else
1733 #define DRFLAC_NO_THREAD_SANITIZE
1734#endif
1735
1736#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1737static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
1738#endif
1739
1740#ifndef DRFLAC_NO_CPUID
1741static drflac_bool32 drflac__gIsSSE2Supported = DRFLAC_FALSE;
1742static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
1743
1744/*
1745I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
1746actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
1747complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
1748just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
1749*/
1750DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1751{
1752 static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
1753
1754 if (!isCPUCapsInitialized) {
1755 /* LZCNT */
1756#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
1757 int info[4] = {0};
1758 drflac__cpuid(info, 0x80000001);
1759 drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
1760#endif
1761
1762 /* SSE2 */
1763 drflac__gIsSSE2Supported = drflac_has_sse2();
1764
1765 /* SSE4.1 */
1766 drflac__gIsSSE41Supported = drflac_has_sse41();
1767
1768 /* Initialized. */
1769 isCPUCapsInitialized = DRFLAC_TRUE;
1770 }
1771}
1772#else
1773static drflac_bool32 drflac__gIsNEONSupported = DRFLAC_FALSE;
1774
1775static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
1776{
1777#if defined(DRFLAC_SUPPORT_NEON)
1778 #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
1779 #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
1780 return DRFLAC_TRUE; /* If the compiler is allowed to freely generate NEON code we can assume support. */
1781 #else
1782 /* TODO: Runtime check. */
1783 return DRFLAC_FALSE;
1784 #endif
1785 #else
1786 return DRFLAC_FALSE; /* NEON is only supported on ARM architectures. */
1787 #endif
1788#else
1789 return DRFLAC_FALSE; /* No compiler support. */
1790#endif
1791}
1792
1793DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
1794{
1795 drflac__gIsNEONSupported = drflac__has_neon();
1796
1797#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
1798 drflac__gIsLZCNTSupported = DRFLAC_TRUE;
1799#endif
1800}
1801#endif
1802
1803
1804/* Endian Management */
1805static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
1806{
1807#if defined(DRFLAC_X86) || defined(DRFLAC_X64)
1808 return DRFLAC_TRUE;
1809#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
1810 return DRFLAC_TRUE;
1811#else
1812 int n = 1;
1813 return (*(char*)&n) == 1;
1814#endif
1815}
1816
1817static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
1818{
1819#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
1820 #if defined(_MSC_VER) && !defined(__clang__)
1821 return _byteswap_ushort(n);
1822 #elif defined(__GNUC__) || defined(__clang__)
1823 return __builtin_bswap16(n);
9e052883 1824 #elif defined(__WATCOMC__) && defined(__386__)
1825 return _watcom_bswap16(n);
2ff0b512 1826 #else
1827 #error "This compiler does not support the byte swap intrinsic."
1828 #endif
1829#else
1830 return ((n & 0xFF00) >> 8) |
1831 ((n & 0x00FF) << 8);
1832#endif
1833}
1834
1835static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
1836{
1837#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
1838 #if defined(_MSC_VER) && !defined(__clang__)
1839 return _byteswap_ulong(n);
1840 #elif defined(__GNUC__) || defined(__clang__)
1841 #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(DRFLAC_64BIT) /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
1842 /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
1843 drflac_uint32 r;
1844 __asm__ __volatile__ (
1845 #if defined(DRFLAC_64BIT)
1846 "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
1847 #else
1848 "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
1849 #endif
1850 );
1851 return r;
1852 #else
1853 return __builtin_bswap32(n);
1854 #endif
9e052883 1855 #elif defined(__WATCOMC__) && defined(__386__)
1856 return _watcom_bswap32(n);
2ff0b512 1857 #else
1858 #error "This compiler does not support the byte swap intrinsic."
1859 #endif
1860#else
1861 return ((n & 0xFF000000) >> 24) |
1862 ((n & 0x00FF0000) >> 8) |
1863 ((n & 0x0000FF00) << 8) |
1864 ((n & 0x000000FF) << 24);
1865#endif
1866}
1867
1868static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
1869{
1870#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
1871 #if defined(_MSC_VER) && !defined(__clang__)
1872 return _byteswap_uint64(n);
1873 #elif defined(__GNUC__) || defined(__clang__)
1874 return __builtin_bswap64(n);
9e052883 1875 #elif defined(__WATCOMC__) && defined(__386__)
1876 return _watcom_bswap64(n);
2ff0b512 1877 #else
1878 #error "This compiler does not support the byte swap intrinsic."
1879 #endif
1880#else
1881 /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
1882 return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
1883 ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
1884 ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
1885 ((n & ((drflac_uint64)0x000000FF << 32)) >> 8) |
1886 ((n & ((drflac_uint64)0xFF000000 )) << 8) |
1887 ((n & ((drflac_uint64)0x00FF0000 )) << 24) |
1888 ((n & ((drflac_uint64)0x0000FF00 )) << 40) |
1889 ((n & ((drflac_uint64)0x000000FF )) << 56);
1890#endif
1891}
1892
1893
1894static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
1895{
1896 if (drflac__is_little_endian()) {
1897 return drflac__swap_endian_uint16(n);
1898 }
1899
1900 return n;
1901}
1902
1903static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
1904{
1905 if (drflac__is_little_endian()) {
1906 return drflac__swap_endian_uint32(n);
1907 }
1908
1909 return n;
1910}
1911
9e052883 1912static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData)
1913{
1914 const drflac_uint8* pNum = (drflac_uint8*)pData;
1915 return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3);
1916}
1917
2ff0b512 1918static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
1919{
1920 if (drflac__is_little_endian()) {
1921 return drflac__swap_endian_uint64(n);
1922 }
1923
1924 return n;
1925}
1926
1927
1928static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
1929{
1930 if (!drflac__is_little_endian()) {
1931 return drflac__swap_endian_uint32(n);
1932 }
1933
1934 return n;
1935}
1936
9e052883 1937static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData)
1938{
1939 const drflac_uint8* pNum = (drflac_uint8*)pData;
1940 return *pNum | *(pNum+1) << 8 | *(pNum+2) << 16 | *(pNum+3) << 24;
1941}
1942
2ff0b512 1943
1944static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
1945{
1946 drflac_uint32 result = 0;
1947 result |= (n & 0x7F000000) >> 3;
1948 result |= (n & 0x007F0000) >> 2;
1949 result |= (n & 0x00007F00) >> 1;
1950 result |= (n & 0x0000007F) >> 0;
1951
1952 return result;
1953}
1954
1955
1956
1957/* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
1958static drflac_uint8 drflac__crc8_table[] = {
1959 0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
1960 0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
1961 0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
1962 0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
1963 0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
1964 0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
1965 0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
1966 0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
1967 0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
1968 0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
1969 0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
1970 0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
1971 0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
1972 0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
1973 0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
1974 0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
1975};
1976
1977static drflac_uint16 drflac__crc16_table[] = {
1978 0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
1979 0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
1980 0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
1981 0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
1982 0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
1983 0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
1984 0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
1985 0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
1986 0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
1987 0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
1988 0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
1989 0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
1990 0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
1991 0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
1992 0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
1993 0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
1994 0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
1995 0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
1996 0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
1997 0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
1998 0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
1999 0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
2000 0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
2001 0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
2002 0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
2003 0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
2004 0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
2005 0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
2006 0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
2007 0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
2008 0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
2009 0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
2010};
2011
2012static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
2013{
2014 return drflac__crc8_table[crc ^ data];
2015}
2016
2017static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
2018{
2019#ifdef DR_FLAC_NO_CRC
2020 (void)crc;
2021 (void)data;
2022 (void)count;
2023 return 0;
9e052883 2024#else
2025#if 0
2026 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
2027 drflac_uint8 p = 0x07;
2028 for (int i = count-1; i >= 0; --i) {
2029 drflac_uint8 bit = (data & (1 << i)) >> i;
2030 if (crc & 0x80) {
2031 crc = ((crc << 1) | bit) ^ p;
2032 } else {
2033 crc = ((crc << 1) | bit);
2034 }
2035 }
2036 return crc;
2ff0b512 2037#else
2038 drflac_uint32 wholeBytes;
2039 drflac_uint32 leftoverBits;
2040 drflac_uint64 leftoverDataMask;
2041
2042 static drflac_uint64 leftoverDataMaskTable[8] = {
2043 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2044 };
2045
2046 DRFLAC_ASSERT(count <= 32);
2047
2048 wholeBytes = count >> 3;
2049 leftoverBits = count - (wholeBytes*8);
2050 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2051
2052 switch (wholeBytes) {
2053 case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2054 case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2055 case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2056 case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2057 case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
2058 }
2059 return crc;
2060#endif
9e052883 2061#endif
2ff0b512 2062}
2063
2064static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
2065{
2066 return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
2067}
2068
2069static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
2070{
2071#ifdef DRFLAC_64BIT
2072 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2073 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2074 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2075 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2076#endif
2077 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2078 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2079 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2080 crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2081
2082 return crc;
2083}
2084
2085static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
2086{
2087 switch (byteCount)
2088 {
2089#ifdef DRFLAC_64BIT
2090 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
2091 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
2092 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
2093 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
2094#endif
2095 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
2096 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
2097 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 8) & 0xFF));
2098 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 0) & 0xFF));
2099 }
2100
2101 return crc;
2102}
2103
9e052883 2104#if 0
2105static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
2106{
2107#ifdef DR_FLAC_NO_CRC
2108 (void)crc;
2109 (void)data;
2110 (void)count;
2111 return 0;
2112#else
2113#if 0
2114 /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
2115 drflac_uint16 p = 0x8005;
2116 for (int i = count-1; i >= 0; --i) {
2117 drflac_uint16 bit = (data & (1ULL << i)) >> i;
2118 if (r & 0x8000) {
2119 r = ((r << 1) | bit) ^ p;
2120 } else {
2121 r = ((r << 1) | bit);
2122 }
2123 }
2124
2125 return crc;
2126#else
2127 drflac_uint32 wholeBytes;
2128 drflac_uint32 leftoverBits;
2129 drflac_uint64 leftoverDataMask;
2130
2131 static drflac_uint64 leftoverDataMaskTable[8] = {
2132 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2133 };
2134
2135 DRFLAC_ASSERT(count <= 64);
2136
2137 wholeBytes = count >> 3;
2138 leftoverBits = count & 7;
2139 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2140
2141 switch (wholeBytes) {
2142 default:
2143 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
2144 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
2145 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
2146 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
2147 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2148 }
2149 return crc;
2150#endif
2151#endif
2152}
2153
2154static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
2155{
2156#ifdef DR_FLAC_NO_CRC
2157 (void)crc;
2158 (void)data;
2159 (void)count;
2160 return 0;
2161#else
2162 drflac_uint32 wholeBytes;
2163 drflac_uint32 leftoverBits;
2164 drflac_uint64 leftoverDataMask;
2165
2166 static drflac_uint64 leftoverDataMaskTable[8] = {
2167 0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
2168 };
2169
2170 DRFLAC_ASSERT(count <= 64);
2171
2172 wholeBytes = count >> 3;
2173 leftoverBits = count & 7;
2174 leftoverDataMask = leftoverDataMaskTable[leftoverBits];
2175
2176 switch (wholeBytes) {
2177 default:
2178 case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits))); /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
2179 case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
2180 case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
2181 case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
2182 case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 ) << leftoverBits)) >> (24 + leftoverBits)));
2183 case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 ) << leftoverBits)) >> (16 + leftoverBits)));
2184 case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 ) << leftoverBits)) >> ( 8 + leftoverBits)));
2185 case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF ) << leftoverBits)) >> ( 0 + leftoverBits)));
2186 case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
2187 }
2188 return crc;
2189#endif
2190}
2191
2192
2193static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
2194{
2195#ifdef DRFLAC_64BIT
2196 return drflac_crc16__64bit(crc, data, count);
2197#else
2198 return drflac_crc16__32bit(crc, data, count);
2199#endif
2200}
2201#endif
2202
2203
2ff0b512 2204#ifdef DRFLAC_64BIT
2205#define drflac__be2host__cache_line drflac__be2host_64
2206#else
2207#define drflac__be2host__cache_line drflac__be2host_32
2208#endif
2209
2210/*
2211BIT READING ATTEMPT #2
2212
2213This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
2214on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
2215is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
2216array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
2217from onRead() is read into.
2218*/
2219#define DRFLAC_CACHE_L1_SIZE_BYTES(bs) (sizeof((bs)->cache))
2220#define DRFLAC_CACHE_L1_SIZE_BITS(bs) (sizeof((bs)->cache)*8)
2221#define DRFLAC_CACHE_L1_BITS_REMAINING(bs) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
2222#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount) (~((~(drflac_cache_t)0) >> (_bitCount)))
2223#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
2224#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount) (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
2225#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount) (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
2226#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
2227#define DRFLAC_CACHE_L2_SIZE_BYTES(bs) (sizeof((bs)->cacheL2))
2228#define DRFLAC_CACHE_L2_LINE_COUNT(bs) (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
2229#define DRFLAC_CACHE_L2_LINES_REMAINING(bs) (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
2230
2231
2232#ifndef DR_FLAC_NO_CRC
2233static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
2234{
2235 bs->crc16 = 0;
2236 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2237}
2238
2239static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
2240{
2241 if (bs->crc16CacheIgnoredBytes == 0) {
2242 bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
2243 } else {
2244 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
2245 bs->crc16CacheIgnoredBytes = 0;
2246 }
2247}
2248
2249static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
2250{
2251 /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
2252 DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
2253
2254 /*
2255 The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
2256 by the number of bits that have been consumed.
2257 */
2258 if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
2259 drflac__update_crc16(bs);
2260 } else {
2261 /* We only accumulate the consumed bits. */
2262 bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
2263
2264 /*
2265 The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
2266 so we can handle that later.
2267 */
2268 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2269 }
2270
2271 return bs->crc16;
2272}
2273#endif
2274
2275static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
2276{
2277 size_t bytesRead;
2278 size_t alignedL1LineCount;
2279
2280 /* Fast path. Try loading straight from L2. */
2281 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
2282 bs->cache = bs->cacheL2[bs->nextL2Line++];
2283 return DRFLAC_TRUE;
2284 }
2285
2286 /*
2287 If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
2288 any left.
2289 */
2290 if (bs->unalignedByteCount > 0) {
2291 return DRFLAC_FALSE; /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
2292 }
2293
2294 bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
2295
2296 bs->nextL2Line = 0;
2297 if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
2298 bs->cache = bs->cacheL2[bs->nextL2Line++];
2299 return DRFLAC_TRUE;
2300 }
2301
2302
2303 /*
2304 If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
2305 means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
2306 and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
2307 the size of the L1 so we'll need to seek backwards by any misaligned bytes.
2308 */
2309 alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
2310
2311 /* We need to keep track of any unaligned bytes for later use. */
2312 bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2313 if (bs->unalignedByteCount > 0) {
2314 bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
2315 }
2316
2317 if (alignedL1LineCount > 0) {
2318 size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
2319 size_t i;
2320 for (i = alignedL1LineCount; i > 0; --i) {
2321 bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
2322 }
2323
2324 bs->nextL2Line = (drflac_uint32)offset;
2325 bs->cache = bs->cacheL2[bs->nextL2Line++];
2326 return DRFLAC_TRUE;
2327 } else {
2328 /* If we get into this branch it means we weren't able to load any L1-aligned data. */
2329 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
2330 return DRFLAC_FALSE;
2331 }
2332}
2333
2334static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
2335{
2336 size_t bytesRead;
2337
2338#ifndef DR_FLAC_NO_CRC
2339 drflac__update_crc16(bs);
2340#endif
2341
2342 /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
2343 if (drflac__reload_l1_cache_from_l2(bs)) {
2344 bs->cache = drflac__be2host__cache_line(bs->cache);
2345 bs->consumedBits = 0;
2346#ifndef DR_FLAC_NO_CRC
2347 bs->crc16Cache = bs->cache;
2348#endif
2349 return DRFLAC_TRUE;
2350 }
2351
2352 /* Slow path. */
2353
2354 /*
2355 If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
2356 few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
2357 data from the unaligned cache.
2358 */
2359 bytesRead = bs->unalignedByteCount;
2360 if (bytesRead == 0) {
2361 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- The stream has been exhausted, so marked the bits as consumed. */
2362 return DRFLAC_FALSE;
2363 }
2364
2365 DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
2366 bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
2367
2368 bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
2369 bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs)); /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
2370 bs->unalignedByteCount = 0; /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
2371
2372#ifndef DR_FLAC_NO_CRC
2373 bs->crc16Cache = bs->cache >> bs->consumedBits;
2374 bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
2375#endif
2376 return DRFLAC_TRUE;
2377}
2378
2379static void drflac__reset_cache(drflac_bs* bs)
2380{
2381 bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs); /* <-- This clears the L2 cache. */
2382 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs); /* <-- This clears the L1 cache. */
2383 bs->cache = 0;
2384 bs->unalignedByteCount = 0; /* <-- This clears the trailing unaligned bytes. */
2385 bs->unalignedCache = 0;
2386
2387#ifndef DR_FLAC_NO_CRC
2388 bs->crc16Cache = 0;
2389 bs->crc16CacheIgnoredBytes = 0;
2390#endif
2391}
2392
2393
2394static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
2395{
2396 DRFLAC_ASSERT(bs != NULL);
2397 DRFLAC_ASSERT(pResultOut != NULL);
2398 DRFLAC_ASSERT(bitCount > 0);
2399 DRFLAC_ASSERT(bitCount <= 32);
2400
2401 if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2402 if (!drflac__reload_cache(bs)) {
2403 return DRFLAC_FALSE;
2404 }
2405 }
2406
2407 if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2408 /*
2409 If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
2410 a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
2411 more optimal solution for this.
2412 */
2413#ifdef DRFLAC_64BIT
2414 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2415 bs->consumedBits += bitCount;
2416 bs->cache <<= bitCount;
2417#else
2418 if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2419 *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
2420 bs->consumedBits += bitCount;
2421 bs->cache <<= bitCount;
2422 } else {
2423 /* Cannot shift by 32-bits, so need to do it differently. */
2424 *pResultOut = (drflac_uint32)bs->cache;
2425 bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
2426 bs->cache = 0;
2427 }
2428#endif
2429
2430 return DRFLAC_TRUE;
2431 } else {
2432 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
2433 drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2434 drflac_uint32 bitCountLo = bitCount - bitCountHi;
2435 drflac_uint32 resultHi;
2436
2437 DRFLAC_ASSERT(bitCountHi > 0);
2438 DRFLAC_ASSERT(bitCountHi < 32);
2439 resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
2440
2441 if (!drflac__reload_cache(bs)) {
2442 return DRFLAC_FALSE;
2443 }
9e052883 2444 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2445 /* This happens when we get to end of stream */
2446 return DRFLAC_FALSE;
2447 }
2ff0b512 2448
2449 *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
2450 bs->consumedBits += bitCountLo;
2451 bs->cache <<= bitCountLo;
2452 return DRFLAC_TRUE;
2453 }
2454}
2455
2456static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
2457{
2458 drflac_uint32 result;
2459
2460 DRFLAC_ASSERT(bs != NULL);
2461 DRFLAC_ASSERT(pResult != NULL);
2462 DRFLAC_ASSERT(bitCount > 0);
2463 DRFLAC_ASSERT(bitCount <= 32);
2464
2465 if (!drflac__read_uint32(bs, bitCount, &result)) {
2466 return DRFLAC_FALSE;
2467 }
2468
2469 /* Do not attempt to shift by 32 as it's undefined. */
2470 if (bitCount < 32) {
2471 drflac_uint32 signbit;
2472 signbit = ((result >> (bitCount-1)) & 0x01);
2473 result |= (~signbit + 1) << bitCount;
2474 }
2475
2476 *pResult = (drflac_int32)result;
2477 return DRFLAC_TRUE;
2478}
2479
2480#ifdef DRFLAC_64BIT
2481static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
2482{
2483 drflac_uint32 resultHi;
2484 drflac_uint32 resultLo;
2485
2486 DRFLAC_ASSERT(bitCount <= 64);
2487 DRFLAC_ASSERT(bitCount > 32);
2488
2489 if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
2490 return DRFLAC_FALSE;
2491 }
2492
2493 if (!drflac__read_uint32(bs, 32, &resultLo)) {
2494 return DRFLAC_FALSE;
2495 }
2496
2497 *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
2498 return DRFLAC_TRUE;
2499}
2500#endif
2501
9e052883 2502/* Function below is unused, but leaving it here in case I need to quickly add it again. */
2503#if 0
2504static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
2505{
2506 drflac_uint64 result;
2507 drflac_uint64 signbit;
2508
2509 DRFLAC_ASSERT(bitCount <= 64);
2510
2511 if (!drflac__read_uint64(bs, bitCount, &result)) {
2512 return DRFLAC_FALSE;
2513 }
2514
2515 signbit = ((result >> (bitCount-1)) & 0x01);
2516 result |= (~signbit + 1) << bitCount;
2517
2518 *pResultOut = (drflac_int64)result;
2519 return DRFLAC_TRUE;
2520}
2521#endif
2522
2ff0b512 2523static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
2524{
2525 drflac_uint32 result;
2526
2527 DRFLAC_ASSERT(bs != NULL);
2528 DRFLAC_ASSERT(pResult != NULL);
2529 DRFLAC_ASSERT(bitCount > 0);
2530 DRFLAC_ASSERT(bitCount <= 16);
2531
2532 if (!drflac__read_uint32(bs, bitCount, &result)) {
2533 return DRFLAC_FALSE;
2534 }
2535
2536 *pResult = (drflac_uint16)result;
2537 return DRFLAC_TRUE;
2538}
2539
9e052883 2540#if 0
2541static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
2542{
2543 drflac_int32 result;
2544
2545 DRFLAC_ASSERT(bs != NULL);
2546 DRFLAC_ASSERT(pResult != NULL);
2547 DRFLAC_ASSERT(bitCount > 0);
2548 DRFLAC_ASSERT(bitCount <= 16);
2549
2550 if (!drflac__read_int32(bs, bitCount, &result)) {
2551 return DRFLAC_FALSE;
2552 }
2553
2554 *pResult = (drflac_int16)result;
2555 return DRFLAC_TRUE;
2556}
2557#endif
2558
2ff0b512 2559static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
2560{
2561 drflac_uint32 result;
2562
2563 DRFLAC_ASSERT(bs != NULL);
2564 DRFLAC_ASSERT(pResult != NULL);
2565 DRFLAC_ASSERT(bitCount > 0);
2566 DRFLAC_ASSERT(bitCount <= 8);
2567
2568 if (!drflac__read_uint32(bs, bitCount, &result)) {
2569 return DRFLAC_FALSE;
2570 }
2571
2572 *pResult = (drflac_uint8)result;
2573 return DRFLAC_TRUE;
2574}
2575
2576static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
2577{
2578 drflac_int32 result;
2579
2580 DRFLAC_ASSERT(bs != NULL);
2581 DRFLAC_ASSERT(pResult != NULL);
2582 DRFLAC_ASSERT(bitCount > 0);
2583 DRFLAC_ASSERT(bitCount <= 8);
2584
2585 if (!drflac__read_int32(bs, bitCount, &result)) {
2586 return DRFLAC_FALSE;
2587 }
2588
2589 *pResult = (drflac_int8)result;
2590 return DRFLAC_TRUE;
2591}
2592
2593
2594static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
2595{
2596 if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2597 bs->consumedBits += (drflac_uint32)bitsToSeek;
2598 bs->cache <<= bitsToSeek;
2599 return DRFLAC_TRUE;
2600 } else {
2601 /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
2602 bitsToSeek -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2603 bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2604 bs->cache = 0;
2605
2606 /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
2607#ifdef DRFLAC_64BIT
2608 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2609 drflac_uint64 bin;
2610 if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2611 return DRFLAC_FALSE;
2612 }
2613 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2614 }
2615#else
2616 while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
2617 drflac_uint32 bin;
2618 if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
2619 return DRFLAC_FALSE;
2620 }
2621 bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
2622 }
2623#endif
2624
2625 /* Whole leftover bytes. */
2626 while (bitsToSeek >= 8) {
2627 drflac_uint8 bin;
2628 if (!drflac__read_uint8(bs, 8, &bin)) {
2629 return DRFLAC_FALSE;
2630 }
2631 bitsToSeek -= 8;
2632 }
2633
2634 /* Leftover bits. */
2635 if (bitsToSeek > 0) {
2636 drflac_uint8 bin;
2637 if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
2638 return DRFLAC_FALSE;
2639 }
2640 bitsToSeek = 0; /* <-- Necessary for the assert below. */
2641 }
2642
2643 DRFLAC_ASSERT(bitsToSeek == 0);
2644 return DRFLAC_TRUE;
2645 }
2646}
2647
2648
2649/* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
2650static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
2651{
2652 DRFLAC_ASSERT(bs != NULL);
2653
2654 /*
2655 The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
2656 thing to do is align to the next byte.
2657 */
2658 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2659 return DRFLAC_FALSE;
2660 }
2661
2662 for (;;) {
2663 drflac_uint8 hi;
2664
2665#ifndef DR_FLAC_NO_CRC
2666 drflac__reset_crc16(bs);
2667#endif
2668
2669 if (!drflac__read_uint8(bs, 8, &hi)) {
2670 return DRFLAC_FALSE;
2671 }
2672
2673 if (hi == 0xFF) {
2674 drflac_uint8 lo;
2675 if (!drflac__read_uint8(bs, 6, &lo)) {
2676 return DRFLAC_FALSE;
2677 }
2678
2679 if (lo == 0x3E) {
2680 return DRFLAC_TRUE;
2681 } else {
2682 if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
2683 return DRFLAC_FALSE;
2684 }
2685 }
2686 }
2687 }
2688
2689 /* Should never get here. */
2690 /*return DRFLAC_FALSE;*/
2691}
2692
2693
2694#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
2695#define DRFLAC_IMPLEMENT_CLZ_LZCNT
2696#endif
2697#if defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
2698#define DRFLAC_IMPLEMENT_CLZ_MSVC
2699#endif
9e052883 2700#if defined(__WATCOMC__) && defined(__386__)
2701#define DRFLAC_IMPLEMENT_CLZ_WATCOM
2702#endif
2703#ifdef __MRC__
2704#include <intrinsics.h>
2705#define DRFLAC_IMPLEMENT_CLZ_MRC
2706#endif
2ff0b512 2707
2708static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
2709{
2710 drflac_uint32 n;
2711 static drflac_uint32 clz_table_4[] = {
2712 0,
2713 4,
2714 3, 3,
2715 2, 2, 2, 2,
2716 1, 1, 1, 1, 1, 1, 1, 1
2717 };
2718
2719 if (x == 0) {
2720 return sizeof(x)*8;
2721 }
2722
2723 n = clz_table_4[x >> (sizeof(x)*8 - 4)];
2724 if (n == 0) {
2725#ifdef DRFLAC_64BIT
2726 if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n = 32; x <<= 32; }
2727 if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
2728 if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8; x <<= 8; }
2729 if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4; x <<= 4; }
2730#else
2731 if ((x & 0xFFFF0000) == 0) { n = 16; x <<= 16; }
2732 if ((x & 0xFF000000) == 0) { n += 8; x <<= 8; }
2733 if ((x & 0xF0000000) == 0) { n += 4; x <<= 4; }
2734#endif
2735 n += clz_table_4[x >> (sizeof(x)*8 - 4)];
2736 }
2737
2738 return n - 1;
2739}
2740
2741#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2742static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
2743{
2744 /* Fast compile time check for ARM. */
2745#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
2746 return DRFLAC_TRUE;
9e052883 2747#elif defined(__MRC__)
2748 return DRFLAC_TRUE;
2ff0b512 2749#else
2750 /* If the compiler itself does not support the intrinsic then we'll need to return false. */
2751 #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
2752 return drflac__gIsLZCNTSupported;
2753 #else
2754 return DRFLAC_FALSE;
2755 #endif
2756#endif
2757}
2758
2759static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
2760{
2761 /*
2762 It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
2763 to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
2764 it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
2765 64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
2766 around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
2767 the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
2768 in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
2769 getting clobbered?
2770
2771 I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
2772 assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
2773
2774 Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
2775 compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
2776 to know how to fix the inlined assembly for correctness sake, however.
2777 */
2778
2779#if defined(_MSC_VER) /*&& !defined(__clang__)*/ /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
2780 #ifdef DRFLAC_64BIT
2781 return (drflac_uint32)__lzcnt64(x);
2782 #else
2783 return (drflac_uint32)__lzcnt(x);
2784 #endif
2785#else
2786 #if defined(__GNUC__) || defined(__clang__)
2787 #if defined(DRFLAC_X64)
2788 {
2789 drflac_uint64 r;
2790 __asm__ __volatile__ (
2791 "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2792 );
2793
2794 return (drflac_uint32)r;
2795 }
2796 #elif defined(DRFLAC_X86)
2797 {
2798 drflac_uint32 r;
2799 __asm__ __volatile__ (
2800 "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
2801 );
2802
2803 return r;
2804 }
2805 #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(DRFLAC_64BIT) /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
2806 {
2807 unsigned int r;
2808 __asm__ __volatile__ (
2809 #if defined(DRFLAC_64BIT)
2810 "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x) /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
2811 #else
2812 "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
2813 #endif
2814 );
2815
2816 return r;
2817 }
2818 #else
2819 if (x == 0) {
2820 return sizeof(x)*8;
2821 }
2822 #ifdef DRFLAC_64BIT
2823 return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
2824 #else
2825 return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
2826 #endif
2827 #endif
2828 #else
2829 /* Unsupported compiler. */
2830 #error "This compiler does not support the lzcnt intrinsic."
2831 #endif
2832#endif
2833}
2834#endif
2835
2836#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2837#include <intrin.h> /* For BitScanReverse(). */
2838
2839static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
2840{
2841 drflac_uint32 n;
2842
2843 if (x == 0) {
2844 return sizeof(x)*8;
2845 }
2846
2847#ifdef DRFLAC_64BIT
2848 _BitScanReverse64((unsigned long*)&n, x);
2849#else
2850 _BitScanReverse((unsigned long*)&n, x);
2851#endif
2852 return sizeof(x)*8 - n - 1;
2853}
2854#endif
2855
9e052883 2856#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
2857static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
2858#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT
2859/* Use the LZCNT instruction (only available on some processors since the 2010s). */
2860#pragma aux drflac__clz_watcom_lzcnt = \
2861 "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \
2862 parm [eax] \
2863 value [eax] \
2864 modify nomemory;
2865#else
2866/* Use the 386+-compatible implementation. */
2867#pragma aux drflac__clz_watcom = \
2868 "bsr eax, eax" \
2869 "xor eax, 31" \
2870 parm [eax] nomemory \
2871 value [eax] \
2872 modify exact [eax] nomemory;
2873#endif
2874#endif
2875
2ff0b512 2876static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
2877{
2878#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
2879 if (drflac__is_lzcnt_supported()) {
2880 return drflac__clz_lzcnt(x);
2881 } else
2882#endif
2883 {
2884#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
2885 return drflac__clz_msvc(x);
9e052883 2886#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT)
2887 return drflac__clz_watcom_lzcnt(x);
2888#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
2889 return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
2890#elif defined(__MRC__)
2891 return __cntlzw(x);
2ff0b512 2892#else
2893 return drflac__clz_software(x);
2894#endif
2895 }
2896}
2897
2898
2899static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
2900{
2901 drflac_uint32 zeroCounter = 0;
2902 drflac_uint32 setBitOffsetPlus1;
2903
2904 while (bs->cache == 0) {
2905 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
2906 if (!drflac__reload_cache(bs)) {
2907 return DRFLAC_FALSE;
2908 }
2909 }
2910
9e052883 2911 if (bs->cache == 1) {
2912 /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */
2913 *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1;
2914 if (!drflac__reload_cache(bs)) {
2915 return DRFLAC_FALSE;
2916 }
2917
2918 return DRFLAC_TRUE;
2919 }
2920
2ff0b512 2921 setBitOffsetPlus1 = drflac__clz(bs->cache);
2922 setBitOffsetPlus1 += 1;
2923
9e052883 2924 if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
2925 /* This happens when we get to end of stream */
2926 return DRFLAC_FALSE;
2927 }
2928
2ff0b512 2929 bs->consumedBits += setBitOffsetPlus1;
2930 bs->cache <<= setBitOffsetPlus1;
2931
2932 *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
2933 return DRFLAC_TRUE;
2934}
2935
2936
2937
2938static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
2939{
2940 DRFLAC_ASSERT(bs != NULL);
2941 DRFLAC_ASSERT(offsetFromStart > 0);
2942
2943 /*
2944 Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
2945 is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
2946 To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
2947 */
2948 if (offsetFromStart > 0x7FFFFFFF) {
2949 drflac_uint64 bytesRemaining = offsetFromStart;
2950 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
2951 return DRFLAC_FALSE;
2952 }
2953 bytesRemaining -= 0x7FFFFFFF;
2954
2955 while (bytesRemaining > 0x7FFFFFFF) {
2956 if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
2957 return DRFLAC_FALSE;
2958 }
2959 bytesRemaining -= 0x7FFFFFFF;
2960 }
2961
2962 if (bytesRemaining > 0) {
2963 if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
2964 return DRFLAC_FALSE;
2965 }
2966 }
2967 } else {
2968 if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
2969 return DRFLAC_FALSE;
2970 }
2971 }
2972
2973 /* The cache should be reset to force a reload of fresh data from the client. */
2974 drflac__reset_cache(bs);
2975 return DRFLAC_TRUE;
2976}
2977
2978
2979static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
2980{
2981 drflac_uint8 crc;
2982 drflac_uint64 result;
2983 drflac_uint8 utf8[7] = {0};
2984 int byteCount;
2985 int i;
2986
2987 DRFLAC_ASSERT(bs != NULL);
2988 DRFLAC_ASSERT(pNumberOut != NULL);
2989 DRFLAC_ASSERT(pCRCOut != NULL);
2990
2991 crc = *pCRCOut;
2992
2993 if (!drflac__read_uint8(bs, 8, utf8)) {
2994 *pNumberOut = 0;
2995 return DRFLAC_AT_END;
2996 }
2997 crc = drflac_crc8(crc, utf8[0], 8);
2998
2999 if ((utf8[0] & 0x80) == 0) {
3000 *pNumberOut = utf8[0];
3001 *pCRCOut = crc;
3002 return DRFLAC_SUCCESS;
3003 }
3004
3005 /*byteCount = 1;*/
3006 if ((utf8[0] & 0xE0) == 0xC0) {
3007 byteCount = 2;
3008 } else if ((utf8[0] & 0xF0) == 0xE0) {
3009 byteCount = 3;
3010 } else if ((utf8[0] & 0xF8) == 0xF0) {
3011 byteCount = 4;
3012 } else if ((utf8[0] & 0xFC) == 0xF8) {
3013 byteCount = 5;
3014 } else if ((utf8[0] & 0xFE) == 0xFC) {
3015 byteCount = 6;
3016 } else if ((utf8[0] & 0xFF) == 0xFE) {
3017 byteCount = 7;
3018 } else {
3019 *pNumberOut = 0;
3020 return DRFLAC_CRC_MISMATCH; /* Bad UTF-8 encoding. */
3021 }
3022
3023 /* Read extra bytes. */
3024 DRFLAC_ASSERT(byteCount > 1);
3025
3026 result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
3027 for (i = 1; i < byteCount; ++i) {
3028 if (!drflac__read_uint8(bs, 8, utf8 + i)) {
3029 *pNumberOut = 0;
3030 return DRFLAC_AT_END;
3031 }
3032 crc = drflac_crc8(crc, utf8[i], 8);
3033
3034 result = (result << 6) | (utf8[i] & 0x3F);
3035 }
3036
3037 *pNumberOut = result;
3038 *pCRCOut = crc;
3039 return DRFLAC_SUCCESS;
3040}
3041
3042
9e052883 3043static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x)
3044{
3045#if 1 /* Needs optimizing. */
3046 drflac_uint32 result = 0;
3047 while (x > 0) {
3048 result += 1;
3049 x >>= 1;
3050 }
3051
3052 return result;
3053#endif
3054}
3055
3056static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision)
3057{
3058 /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */
3059 return bitsPerSample + precision + drflac__ilog2_u32(order) > 32;
3060}
3061
2ff0b512 3062
3063/*
3064The next two functions are responsible for calculating the prediction.
3065
3066When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
3067safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
3068*/
9e052883 3069#if defined(__clang__)
3070__attribute__((no_sanitize("signed-integer-overflow")))
3071#endif
2ff0b512 3072static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3073{
3074 drflac_int32 prediction = 0;
3075
3076 DRFLAC_ASSERT(order <= 32);
3077
3078 /* 32-bit version. */
3079
3080 /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
3081 switch (order)
3082 {
3083 case 32: prediction += coefficients[31] * pDecodedSamples[-32];
3084 case 31: prediction += coefficients[30] * pDecodedSamples[-31];
3085 case 30: prediction += coefficients[29] * pDecodedSamples[-30];
3086 case 29: prediction += coefficients[28] * pDecodedSamples[-29];
3087 case 28: prediction += coefficients[27] * pDecodedSamples[-28];
3088 case 27: prediction += coefficients[26] * pDecodedSamples[-27];
3089 case 26: prediction += coefficients[25] * pDecodedSamples[-26];
3090 case 25: prediction += coefficients[24] * pDecodedSamples[-25];
3091 case 24: prediction += coefficients[23] * pDecodedSamples[-24];
3092 case 23: prediction += coefficients[22] * pDecodedSamples[-23];
3093 case 22: prediction += coefficients[21] * pDecodedSamples[-22];
3094 case 21: prediction += coefficients[20] * pDecodedSamples[-21];
3095 case 20: prediction += coefficients[19] * pDecodedSamples[-20];
3096 case 19: prediction += coefficients[18] * pDecodedSamples[-19];
3097 case 18: prediction += coefficients[17] * pDecodedSamples[-18];
3098 case 17: prediction += coefficients[16] * pDecodedSamples[-17];
3099 case 16: prediction += coefficients[15] * pDecodedSamples[-16];
3100 case 15: prediction += coefficients[14] * pDecodedSamples[-15];
3101 case 14: prediction += coefficients[13] * pDecodedSamples[-14];
3102 case 13: prediction += coefficients[12] * pDecodedSamples[-13];
3103 case 12: prediction += coefficients[11] * pDecodedSamples[-12];
3104 case 11: prediction += coefficients[10] * pDecodedSamples[-11];
3105 case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
3106 case 9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
3107 case 8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
3108 case 7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
3109 case 6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
3110 case 5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
3111 case 4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
3112 case 3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
3113 case 2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
3114 case 1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
3115 }
3116
3117 return (drflac_int32)(prediction >> shift);
3118}
3119
3120static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
3121{
3122 drflac_int64 prediction;
3123
3124 DRFLAC_ASSERT(order <= 32);
3125
3126 /* 64-bit version. */
3127
3128 /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
3129#ifndef DRFLAC_64BIT
3130 if (order == 8)
3131 {
3132 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3133 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3134 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3135 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3136 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3137 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3138 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3139 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3140 }
3141 else if (order == 7)
3142 {
3143 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3144 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3145 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3146 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3147 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3148 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3149 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3150 }
3151 else if (order == 3)
3152 {
3153 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3154 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3155 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3156 }
3157 else if (order == 6)
3158 {
3159 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3160 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3161 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3162 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3163 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3164 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3165 }
3166 else if (order == 5)
3167 {
3168 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3169 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3170 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3171 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3172 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3173 }
3174 else if (order == 4)
3175 {
3176 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3177 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3178 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3179 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3180 }
3181 else if (order == 12)
3182 {
3183 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3184 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3185 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3186 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3187 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3188 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3189 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3190 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3191 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3192 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3193 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3194 prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3195 }
3196 else if (order == 2)
3197 {
3198 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3199 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3200 }
3201 else if (order == 1)
3202 {
3203 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3204 }
3205 else if (order == 10)
3206 {
3207 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3208 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3209 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3210 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3211 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3212 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3213 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3214 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3215 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3216 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3217 }
3218 else if (order == 9)
3219 {
3220 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3221 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3222 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3223 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3224 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3225 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3226 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3227 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3228 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3229 }
3230 else if (order == 11)
3231 {
3232 prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
3233 prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
3234 prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
3235 prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
3236 prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
3237 prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
3238 prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
3239 prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
3240 prediction += coefficients[8] * (drflac_int64)pDecodedSamples[-9];
3241 prediction += coefficients[9] * (drflac_int64)pDecodedSamples[-10];
3242 prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3243 }
3244 else
3245 {
3246 int j;
3247
3248 prediction = 0;
3249 for (j = 0; j < (int)order; ++j) {
3250 prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
3251 }
3252 }
3253#endif
3254
3255 /*
3256 VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
3257 reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
3258 */
3259#ifdef DRFLAC_64BIT
3260 prediction = 0;
3261 switch (order)
3262 {
3263 case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
3264 case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
3265 case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
3266 case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
3267 case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
3268 case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
3269 case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
3270 case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
3271 case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
3272 case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
3273 case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
3274 case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
3275 case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
3276 case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
3277 case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
3278 case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
3279 case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
3280 case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
3281 case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
3282 case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
3283 case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
3284 case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
3285 case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
3286 case 9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
3287 case 8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
3288 case 7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
3289 case 6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
3290 case 5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
3291 case 4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
3292 case 3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
3293 case 2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
3294 case 1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
3295 }
3296#endif
3297
3298 return (drflac_int32)(prediction >> shift);
3299}
3300
2ff0b512 3301
9e052883 3302#if 0
3303/*
3304Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
3305sake of readability and should only be used as a reference.
3306*/
3307static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3308{
3309 drflac_uint32 i;
2ff0b512 3310
9e052883 3311 DRFLAC_ASSERT(bs != NULL);
3312 DRFLAC_ASSERT(pSamplesOut != NULL);
2ff0b512 3313
9e052883 3314 for (i = 0; i < count; ++i) {
3315 drflac_uint32 zeroCounter = 0;
3316 for (;;) {
3317 drflac_uint8 bit;
3318 if (!drflac__read_uint8(bs, 1, &bit)) {
3319 return DRFLAC_FALSE;
3320 }
2ff0b512 3321
9e052883 3322 if (bit == 0) {
3323 zeroCounter += 1;
3324 } else {
3325 break;
3326 }
3327 }
3328
3329 drflac_uint32 decodedRice;
3330 if (riceParam > 0) {
3331 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3332 return DRFLAC_FALSE;
3333 }
3334 } else {
3335 decodedRice = 0;
3336 }
3337
3338 decodedRice |= (zeroCounter << riceParam);
3339 if ((decodedRice & 0x01)) {
3340 decodedRice = ~(decodedRice >> 1);
3341 } else {
3342 decodedRice = (decodedRice >> 1);
3343 }
3344
3345
3346 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3347 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3348 } else {
3349 pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
3350 }
3351 }
3352
3353 return DRFLAC_TRUE;
3354}
3355#endif
3356
3357#if 0
3358static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3359{
3360 drflac_uint32 zeroCounter = 0;
3361 drflac_uint32 decodedRice;
3362
3363 for (;;) {
3364 drflac_uint8 bit;
3365 if (!drflac__read_uint8(bs, 1, &bit)) {
3366 return DRFLAC_FALSE;
3367 }
3368
3369 if (bit == 0) {
3370 zeroCounter += 1;
3371 } else {
3372 break;
3373 }
3374 }
3375
3376 if (riceParam > 0) {
3377 if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
3378 return DRFLAC_FALSE;
3379 }
3380 } else {
3381 decodedRice = 0;
3382 }
3383
3384 *pZeroCounterOut = zeroCounter;
3385 *pRiceParamPartOut = decodedRice;
3386 return DRFLAC_TRUE;
3387}
3388#endif
3389
3390#if 0
3391static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3392{
3393 drflac_cache_t riceParamMask;
3394 drflac_uint32 zeroCounter;
3395 drflac_uint32 setBitOffsetPlus1;
3396 drflac_uint32 riceParamPart;
3397 drflac_uint32 riceLength;
3398
3399 DRFLAC_ASSERT(riceParam > 0); /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
3400
3401 riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
3402
3403 zeroCounter = 0;
3404 while (bs->cache == 0) {
3405 zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
3406 if (!drflac__reload_cache(bs)) {
3407 return DRFLAC_FALSE;
3408 }
3409 }
3410
3411 setBitOffsetPlus1 = drflac__clz(bs->cache);
3412 zeroCounter += setBitOffsetPlus1;
3413 setBitOffsetPlus1 += 1;
3414
3415 riceLength = setBitOffsetPlus1 + riceParam;
3416 if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3417 riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
3418
3419 bs->consumedBits += riceLength;
3420 bs->cache <<= riceLength;
3421 } else {
3422 drflac_uint32 bitCountLo;
3423 drflac_cache_t resultHi;
3424
3425 bs->consumedBits += riceLength;
3426 bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1); /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
3427
3428 /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
3429 bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
3430 resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam); /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
3431
3432 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3433#ifndef DR_FLAC_NO_CRC
3434 drflac__update_crc16(bs);
3435#endif
3436 bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3437 bs->consumedBits = 0;
3438#ifndef DR_FLAC_NO_CRC
3439 bs->crc16Cache = bs->cache;
3440#endif
3441 } else {
3442 /* Slow path. We need to fetch more data from the client. */
3443 if (!drflac__reload_cache(bs)) {
3444 return DRFLAC_FALSE;
3445 }
3446 if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3447 /* This happens when we get to end of stream */
3448 return DRFLAC_FALSE;
3449 }
3450 }
3451
3452 riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
3453
3454 bs->consumedBits += bitCountLo;
3455 bs->cache <<= bitCountLo;
3456 }
3457
3458 pZeroCounterOut[0] = zeroCounter;
3459 pRiceParamPartOut[0] = riceParamPart;
3460
3461 return DRFLAC_TRUE;
3462}
3463#endif
3464
3465static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
3466{
3467 drflac_uint32 riceParamPlus1 = riceParam + 1;
3468 /*drflac_cache_t riceParamPlus1Mask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
3469 drflac_uint32 riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
3470 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3471
3472 /*
3473 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3474 no idea how this will work in practice...
3475 */
3476 drflac_cache_t bs_cache = bs->cache;
3477 drflac_uint32 bs_consumedBits = bs->consumedBits;
3478
3479 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3480 drflac_uint32 lzcount = drflac__clz(bs_cache);
3481 if (lzcount < sizeof(bs_cache)*8) {
3482 pZeroCounterOut[0] = lzcount;
3483
3484 /*
3485 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3486 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3487 outside of this function at a higher level.
3488 */
3489 extract_rice_param_part:
3490 bs_cache <<= lzcount;
3491 bs_consumedBits += lzcount;
3492
3493 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
f5b7bb83 3494 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3495 pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
3496 bs_cache <<= riceParamPlus1;
3497 bs_consumedBits += riceParamPlus1;
2ff0b512 3498 } else {
f5b7bb83 3499 drflac_uint32 riceParamPartHi;
3500 drflac_uint32 riceParamPartLo;
3501 drflac_uint32 riceParamPartLoBitCount;
2ff0b512 3502
f5b7bb83 3503 /*
3504 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3505 line, reload the cache, and then combine it with the head of the next cache line.
3506 */
2ff0b512 3507
f5b7bb83 3508 /* Grab the high part of the rice parameter part. */
3509 riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
2ff0b512 3510
f5b7bb83 3511 /* Before reloading the cache we need to grab the size in bits of the low part. */
3512 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3513 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
2ff0b512 3514
f5b7bb83 3515 /* Now reload the cache. */
3516 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3517 #ifndef DR_FLAC_NO_CRC
3518 drflac__update_crc16(bs);
3519 #endif
3520 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3521 bs_consumedBits = riceParamPartLoBitCount;
3522 #ifndef DR_FLAC_NO_CRC
3523 bs->crc16Cache = bs_cache;
3524 #endif
3525 } else {
3526 /* Slow path. We need to fetch more data from the client. */
3527 if (!drflac__reload_cache(bs)) {
3528 return DRFLAC_FALSE;
3529 }
9e052883 3530 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3531 /* This happens when we get to end of stream */
3532 return DRFLAC_FALSE;
3533 }
2ff0b512 3534
f5b7bb83 3535 bs_cache = bs->cache;
3536 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3537 }
2ff0b512 3538
3539 /* We should now have enough information to construct the rice parameter part. */
3540 riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
3541 pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
3542
3543 bs_cache <<= riceParamPartLoBitCount;
3544 }
3545 } else {
3546 /*
3547 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3548 to drflac__clz() and we need to reload the cache.
3549 */
3550 drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
3551 for (;;) {
3552 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3553 #ifndef DR_FLAC_NO_CRC
3554 drflac__update_crc16(bs);
3555 #endif
3556 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3557 bs_consumedBits = 0;
3558 #ifndef DR_FLAC_NO_CRC
3559 bs->crc16Cache = bs_cache;
3560 #endif
3561 } else {
3562 /* Slow path. We need to fetch more data from the client. */
3563 if (!drflac__reload_cache(bs)) {
3564 return DRFLAC_FALSE;
3565 }
3566
3567 bs_cache = bs->cache;
3568 bs_consumedBits = bs->consumedBits;
3569 }
3570
3571 lzcount = drflac__clz(bs_cache);
3572 zeroCounter += lzcount;
3573
3574 if (lzcount < sizeof(bs_cache)*8) {
3575 break;
3576 }
3577 }
3578
3579 pZeroCounterOut[0] = zeroCounter;
3580 goto extract_rice_param_part;
3581 }
3582
3583 /* Make sure the cache is restored at the end of it all. */
3584 bs->cache = bs_cache;
3585 bs->consumedBits = bs_consumedBits;
3586
3587 return DRFLAC_TRUE;
3588}
3589
3590static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
3591{
3592 drflac_uint32 riceParamPlus1 = riceParam + 1;
3593 drflac_uint32 riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
3594
3595 /*
3596 The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
3597 no idea how this will work in practice...
3598 */
3599 drflac_cache_t bs_cache = bs->cache;
3600 drflac_uint32 bs_consumedBits = bs->consumedBits;
3601
3602 /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
3603 drflac_uint32 lzcount = drflac__clz(bs_cache);
3604 if (lzcount < sizeof(bs_cache)*8) {
3605 /*
3606 It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
3607 this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
3608 outside of this function at a higher level.
3609 */
3610 extract_rice_param_part:
3611 bs_cache <<= lzcount;
3612 bs_consumedBits += lzcount;
3613
3614 if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
3615 /* Getting here means the rice parameter part is wholly contained within the current cache line. */
3616 bs_cache <<= riceParamPlus1;
3617 bs_consumedBits += riceParamPlus1;
3618 } else {
3619 /*
3620 Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
3621 line, reload the cache, and then combine it with the head of the next cache line.
3622 */
3623
3624 /* Before reloading the cache we need to grab the size in bits of the low part. */
3625 drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
3626 DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
3627
3628 /* Now reload the cache. */
3629 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3630 #ifndef DR_FLAC_NO_CRC
3631 drflac__update_crc16(bs);
3632 #endif
3633 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3634 bs_consumedBits = riceParamPartLoBitCount;
3635 #ifndef DR_FLAC_NO_CRC
3636 bs->crc16Cache = bs_cache;
3637 #endif
3638 } else {
3639 /* Slow path. We need to fetch more data from the client. */
3640 if (!drflac__reload_cache(bs)) {
3641 return DRFLAC_FALSE;
3642 }
3643
9e052883 3644 if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
3645 /* This happens when we get to end of stream */
3646 return DRFLAC_FALSE;
3647 }
3648
2ff0b512 3649 bs_cache = bs->cache;
3650 bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
3651 }
3652
3653 bs_cache <<= riceParamPartLoBitCount;
3654 }
3655 } else {
3656 /*
3657 Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
3658 to drflac__clz() and we need to reload the cache.
3659 */
3660 for (;;) {
3661 if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
3662 #ifndef DR_FLAC_NO_CRC
3663 drflac__update_crc16(bs);
3664 #endif
3665 bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
3666 bs_consumedBits = 0;
3667 #ifndef DR_FLAC_NO_CRC
3668 bs->crc16Cache = bs_cache;
3669 #endif
3670 } else {
3671 /* Slow path. We need to fetch more data from the client. */
3672 if (!drflac__reload_cache(bs)) {
3673 return DRFLAC_FALSE;
3674 }
3675
3676 bs_cache = bs->cache;
3677 bs_consumedBits = bs->consumedBits;
3678 }
3679
3680 lzcount = drflac__clz(bs_cache);
3681 if (lzcount < sizeof(bs_cache)*8) {
3682 break;
3683 }
3684 }
3685
3686 goto extract_rice_param_part;
3687 }
3688
3689 /* Make sure the cache is restored at the end of it all. */
3690 bs->cache = bs_cache;
3691 bs->consumedBits = bs_consumedBits;
3692
3693 return DRFLAC_TRUE;
3694}
3695
3696
3697static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3698{
3699 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3700 drflac_uint32 zeroCountPart0;
3701 drflac_uint32 riceParamPart0;
3702 drflac_uint32 riceParamMask;
3703 drflac_uint32 i;
3704
3705 DRFLAC_ASSERT(bs != NULL);
2ff0b512 3706 DRFLAC_ASSERT(pSamplesOut != NULL);
3707
3708 (void)bitsPerSample;
3709 (void)order;
3710 (void)shift;
3711 (void)coefficients;
3712
3713 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3714
3715 i = 0;
3716 while (i < count) {
3717 /* Rice extraction. */
3718 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3719 return DRFLAC_FALSE;
3720 }
3721
3722 /* Rice reconstruction. */
3723 riceParamPart0 &= riceParamMask;
3724 riceParamPart0 |= (zeroCountPart0 << riceParam);
3725 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3726
3727 pSamplesOut[i] = riceParamPart0;
3728
3729 i += 1;
3730 }
3731
3732 return DRFLAC_TRUE;
3733}
3734
9e052883 3735static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 3736{
3737 drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3738 drflac_uint32 zeroCountPart0 = 0;
3739 drflac_uint32 zeroCountPart1 = 0;
3740 drflac_uint32 zeroCountPart2 = 0;
3741 drflac_uint32 zeroCountPart3 = 0;
3742 drflac_uint32 riceParamPart0 = 0;
3743 drflac_uint32 riceParamPart1 = 0;
3744 drflac_uint32 riceParamPart2 = 0;
3745 drflac_uint32 riceParamPart3 = 0;
3746 drflac_uint32 riceParamMask;
3747 const drflac_int32* pSamplesOutEnd;
3748 drflac_uint32 i;
3749
3750 DRFLAC_ASSERT(bs != NULL);
2ff0b512 3751 DRFLAC_ASSERT(pSamplesOut != NULL);
3752
9e052883 3753 if (lpcOrder == 0) {
3754 return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 3755 }
3756
3757 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3758 pSamplesOutEnd = pSamplesOut + (count & ~3);
3759
9e052883 3760 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
2ff0b512 3761 while (pSamplesOut < pSamplesOutEnd) {
3762 /*
3763 Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
3764 against an array. Not sure why, but perhaps it's making more efficient use of registers?
3765 */
3766 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3767 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3768 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3769 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3770 return DRFLAC_FALSE;
3771 }
3772
3773 riceParamPart0 &= riceParamMask;
3774 riceParamPart1 &= riceParamMask;
3775 riceParamPart2 &= riceParamMask;
3776 riceParamPart3 &= riceParamMask;
3777
3778 riceParamPart0 |= (zeroCountPart0 << riceParam);
3779 riceParamPart1 |= (zeroCountPart1 << riceParam);
3780 riceParamPart2 |= (zeroCountPart2 << riceParam);
3781 riceParamPart3 |= (zeroCountPart3 << riceParam);
3782
3783 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3784 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3785 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3786 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3787
9e052883 3788 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3789 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3790 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3791 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
2ff0b512 3792
3793 pSamplesOut += 4;
3794 }
3795 } else {
3796 while (pSamplesOut < pSamplesOutEnd) {
3797 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
3798 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
3799 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
3800 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
3801 return DRFLAC_FALSE;
3802 }
3803
3804 riceParamPart0 &= riceParamMask;
3805 riceParamPart1 &= riceParamMask;
3806 riceParamPart2 &= riceParamMask;
3807 riceParamPart3 &= riceParamMask;
3808
3809 riceParamPart0 |= (zeroCountPart0 << riceParam);
3810 riceParamPart1 |= (zeroCountPart1 << riceParam);
3811 riceParamPart2 |= (zeroCountPart2 << riceParam);
3812 riceParamPart3 |= (zeroCountPart3 << riceParam);
3813
3814 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3815 riceParamPart1 = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
3816 riceParamPart2 = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
3817 riceParamPart3 = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
3818
9e052883 3819 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
3820 pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
3821 pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
3822 pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
2ff0b512 3823
3824 pSamplesOut += 4;
3825 }
3826 }
3827
3828 i = (count & ~3);
3829 while (i < count) {
3830 /* Rice extraction. */
3831 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
3832 return DRFLAC_FALSE;
3833 }
3834
3835 /* Rice reconstruction. */
3836 riceParamPart0 &= riceParamMask;
3837 riceParamPart0 |= (zeroCountPart0 << riceParam);
3838 riceParamPart0 = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
3839 /*riceParamPart0 = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
3840
3841 /* Sample reconstruction. */
9e052883 3842 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
3843 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
2ff0b512 3844 } else {
9e052883 3845 pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
2ff0b512 3846 }
3847
3848 i += 1;
3849 pSamplesOut += 1;
3850 }
3851
3852 return DRFLAC_TRUE;
3853}
3854
3855#if defined(DRFLAC_SUPPORT_SSE2)
3856static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
3857{
3858 __m128i r;
3859
3860 /* Pack. */
3861 r = _mm_packs_epi32(a, b);
3862
3863 /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
3864 r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
3865
3866 /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
3867 r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3868 r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
3869
3870 return r;
3871}
3872#endif
3873
3874#if defined(DRFLAC_SUPPORT_SSE41)
3875static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
3876{
3877 return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
3878}
3879
3880static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
3881{
3882 __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3883 __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
3884 return _mm_add_epi32(x64, x32);
3885}
3886
3887static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
3888{
3889 return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
3890}
3891
3892static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
3893{
3894 /*
3895 To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
3896 is shifted with zero bits, whereas the right side is shifted with sign bits.
3897 */
3898 __m128i lo = _mm_srli_epi64(x, count);
3899 __m128i hi = _mm_srai_epi32(x, count);
3900
3901 hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0)); /* The high part needs to have the low part cleared. */
3902
3903 return _mm_or_si128(lo, hi);
3904}
3905
3906static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
3907{
3908 int i;
3909 drflac_uint32 riceParamMask;
3910 drflac_int32* pDecodedSamples = pSamplesOut;
3911 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
3912 drflac_uint32 zeroCountParts0 = 0;
3913 drflac_uint32 zeroCountParts1 = 0;
3914 drflac_uint32 zeroCountParts2 = 0;
3915 drflac_uint32 zeroCountParts3 = 0;
3916 drflac_uint32 riceParamParts0 = 0;
3917 drflac_uint32 riceParamParts1 = 0;
3918 drflac_uint32 riceParamParts2 = 0;
3919 drflac_uint32 riceParamParts3 = 0;
3920 __m128i coefficients128_0;
3921 __m128i coefficients128_4;
3922 __m128i coefficients128_8;
3923 __m128i samples128_0;
3924 __m128i samples128_4;
3925 __m128i samples128_8;
3926 __m128i riceParamMask128;
3927
3928 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
3929
3930 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
3931 riceParamMask128 = _mm_set1_epi32(riceParamMask);
3932
3933 /* Pre-load. */
3934 coefficients128_0 = _mm_setzero_si128();
3935 coefficients128_4 = _mm_setzero_si128();
3936 coefficients128_8 = _mm_setzero_si128();
3937
3938 samples128_0 = _mm_setzero_si128();
3939 samples128_4 = _mm_setzero_si128();
3940 samples128_8 = _mm_setzero_si128();
3941
3942 /*
3943 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
3944 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
3945 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
3946 so I think there's opportunity for this to be simplified.
3947 */
3948#if 1
3949 {
3950 int runningOrder = order;
3951
3952 /* 0 - 3. */
3953 if (runningOrder >= 4) {
3954 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
3955 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
3956 runningOrder -= 4;
3957 } else {
3958 switch (runningOrder) {
3959 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
3960 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
3961 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
3962 }
3963 runningOrder = 0;
3964 }
3965
3966 /* 4 - 7 */
3967 if (runningOrder >= 4) {
3968 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
3969 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
3970 runningOrder -= 4;
3971 } else {
3972 switch (runningOrder) {
3973 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
3974 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
3975 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
3976 }
3977 runningOrder = 0;
3978 }
3979
3980 /* 8 - 11 */
3981 if (runningOrder == 4) {
3982 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
3983 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
3984 runningOrder -= 4;
3985 } else {
3986 switch (runningOrder) {
3987 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
3988 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
3989 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
3990 }
3991 runningOrder = 0;
3992 }
3993
3994 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
3995 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
3996 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
3997 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
3998 }
3999#else
4000 /* This causes strict-aliasing warnings with GCC. */
4001 switch (order)
4002 {
4003 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4004 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4005 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4006 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4007 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4008 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4009 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4010 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4011 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4012 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4013 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4014 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4015 }
4016#endif
4017
4018 /* For this version we are doing one sample at a time. */
4019 while (pDecodedSamples < pDecodedSamplesEnd) {
4020 __m128i prediction128;
4021 __m128i zeroCountPart128;
4022 __m128i riceParamPart128;
4023
4024 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4025 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4026 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4027 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4028 return DRFLAC_FALSE;
4029 }
4030
4031 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4032 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4033
4034 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4035 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4036 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01))); /* <-- SSE2 compatible */
4037 /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/ /* <-- Only supported from SSE4.1 and is slower in my testing... */
4038
4039 if (order <= 4) {
4040 for (i = 0; i < 4; i += 1) {
4041 prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
4042
4043 /* Horizontal add and shift. */
4044 prediction128 = drflac__mm_hadd_epi32(prediction128);
4045 prediction128 = _mm_srai_epi32(prediction128, shift);
4046 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4047
4048 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4049 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4050 }
4051 } else if (order <= 8) {
4052 for (i = 0; i < 4; i += 1) {
4053 prediction128 = _mm_mullo_epi32(coefficients128_4, samples128_4);
4054 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
4055
4056 /* Horizontal add and shift. */
4057 prediction128 = drflac__mm_hadd_epi32(prediction128);
4058 prediction128 = _mm_srai_epi32(prediction128, shift);
4059 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4060
4061 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4062 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4063 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4064 }
4065 } else {
4066 for (i = 0; i < 4; i += 1) {
4067 prediction128 = _mm_mullo_epi32(coefficients128_8, samples128_8);
4068 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
4069 prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
4070
4071 /* Horizontal add and shift. */
4072 prediction128 = drflac__mm_hadd_epi32(prediction128);
4073 prediction128 = _mm_srai_epi32(prediction128, shift);
4074 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4075
4076 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4077 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4078 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4079 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4080 }
4081 }
4082
4083 /* We store samples in groups of 4. */
4084 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4085 pDecodedSamples += 4;
4086 }
4087
4088 /* Make sure we process the last few samples. */
4089 i = (count & ~3);
4090 while (i < (int)count) {
4091 /* Rice extraction. */
4092 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4093 return DRFLAC_FALSE;
4094 }
4095
4096 /* Rice reconstruction. */
4097 riceParamParts0 &= riceParamMask;
4098 riceParamParts0 |= (zeroCountParts0 << riceParam);
4099 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4100
4101 /* Sample reconstruction. */
4102 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4103
4104 i += 1;
4105 pDecodedSamples += 1;
4106 }
4107
4108 return DRFLAC_TRUE;
4109}
4110
4111static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4112{
4113 int i;
4114 drflac_uint32 riceParamMask;
4115 drflac_int32* pDecodedSamples = pSamplesOut;
4116 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4117 drflac_uint32 zeroCountParts0 = 0;
4118 drflac_uint32 zeroCountParts1 = 0;
4119 drflac_uint32 zeroCountParts2 = 0;
4120 drflac_uint32 zeroCountParts3 = 0;
4121 drflac_uint32 riceParamParts0 = 0;
4122 drflac_uint32 riceParamParts1 = 0;
4123 drflac_uint32 riceParamParts2 = 0;
4124 drflac_uint32 riceParamParts3 = 0;
4125 __m128i coefficients128_0;
4126 __m128i coefficients128_4;
4127 __m128i coefficients128_8;
4128 __m128i samples128_0;
4129 __m128i samples128_4;
4130 __m128i samples128_8;
4131 __m128i prediction128;
4132 __m128i riceParamMask128;
4133
4134 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4135
4136 DRFLAC_ASSERT(order <= 12);
4137
4138 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
4139 riceParamMask128 = _mm_set1_epi32(riceParamMask);
4140
4141 prediction128 = _mm_setzero_si128();
4142
4143 /* Pre-load. */
4144 coefficients128_0 = _mm_setzero_si128();
4145 coefficients128_4 = _mm_setzero_si128();
4146 coefficients128_8 = _mm_setzero_si128();
4147
4148 samples128_0 = _mm_setzero_si128();
4149 samples128_4 = _mm_setzero_si128();
4150 samples128_8 = _mm_setzero_si128();
4151
4152#if 1
4153 {
4154 int runningOrder = order;
4155
4156 /* 0 - 3. */
4157 if (runningOrder >= 4) {
4158 coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
4159 samples128_0 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 4));
4160 runningOrder -= 4;
4161 } else {
4162 switch (runningOrder) {
4163 case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
4164 case 2: coefficients128_0 = _mm_set_epi32(0, 0, coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0, 0); break;
4165 case 1: coefficients128_0 = _mm_set_epi32(0, 0, 0, coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0, 0, 0); break;
4166 }
4167 runningOrder = 0;
4168 }
4169
4170 /* 4 - 7 */
4171 if (runningOrder >= 4) {
4172 coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
4173 samples128_4 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 8));
4174 runningOrder -= 4;
4175 } else {
4176 switch (runningOrder) {
4177 case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
4178 case 2: coefficients128_4 = _mm_set_epi32(0, 0, coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0, 0); break;
4179 case 1: coefficients128_4 = _mm_set_epi32(0, 0, 0, coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0, 0, 0); break;
4180 }
4181 runningOrder = 0;
4182 }
4183
4184 /* 8 - 11 */
4185 if (runningOrder == 4) {
4186 coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
4187 samples128_8 = _mm_loadu_si128((const __m128i*)(pSamplesOut - 12));
4188 runningOrder -= 4;
4189 } else {
4190 switch (runningOrder) {
4191 case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
4192 case 2: coefficients128_8 = _mm_set_epi32(0, 0, coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0, 0); break;
4193 case 1: coefficients128_8 = _mm_set_epi32(0, 0, 0, coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0, 0, 0); break;
4194 }
4195 runningOrder = 0;
4196 }
4197
4198 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4199 coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
4200 coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
4201 coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
4202 }
4203#else
4204 switch (order)
4205 {
4206 case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
4207 case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
4208 case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
4209 case 9: ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
4210 case 8: ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
4211 case 7: ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
4212 case 6: ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
4213 case 5: ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
4214 case 4: ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
4215 case 3: ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
4216 case 2: ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
4217 case 1: ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
4218 }
4219#endif
4220
4221 /* For this version we are doing one sample at a time. */
4222 while (pDecodedSamples < pDecodedSamplesEnd) {
4223 __m128i zeroCountPart128;
4224 __m128i riceParamPart128;
4225
4226 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
4227 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
4228 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
4229 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
4230 return DRFLAC_FALSE;
4231 }
4232
4233 zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
4234 riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
4235
4236 riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
4237 riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
4238 riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
4239
4240 for (i = 0; i < 4; i += 1) {
4241 prediction128 = _mm_xor_si128(prediction128, prediction128); /* Reset to 0. */
4242
4243 switch (order)
4244 {
4245 case 12:
4246 case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
4247 case 10:
4248 case 9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
4249 case 8:
4250 case 7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
4251 case 6:
4252 case 5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
4253 case 4:
4254 case 3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
4255 case 2:
4256 case 1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
4257 }
4258
4259 /* Horizontal add and shift. */
4260 prediction128 = drflac__mm_hadd_epi64(prediction128);
4261 prediction128 = drflac__mm_srai_epi64(prediction128, shift);
4262 prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
4263
4264 /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
4265 samples128_8 = _mm_alignr_epi8(samples128_4, samples128_8, 4);
4266 samples128_4 = _mm_alignr_epi8(samples128_0, samples128_4, 4);
4267 samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
4268
4269 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4270 riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
4271 }
4272
4273 /* We store samples in groups of 4. */
4274 _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
4275 pDecodedSamples += 4;
4276 }
4277
4278 /* Make sure we process the last few samples. */
4279 i = (count & ~3);
4280 while (i < (int)count) {
4281 /* Rice extraction. */
4282 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
4283 return DRFLAC_FALSE;
4284 }
4285
4286 /* Rice reconstruction. */
4287 riceParamParts0 &= riceParamMask;
4288 riceParamParts0 |= (zeroCountParts0 << riceParam);
4289 riceParamParts0 = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
4290
4291 /* Sample reconstruction. */
4292 pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4293
4294 i += 1;
4295 pDecodedSamples += 1;
4296 }
4297
4298 return DRFLAC_TRUE;
4299}
4300
9e052883 4301static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 4302{
4303 DRFLAC_ASSERT(bs != NULL);
2ff0b512 4304 DRFLAC_ASSERT(pSamplesOut != NULL);
4305
4306 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
9e052883 4307 if (lpcOrder > 0 && lpcOrder <= 12) {
4308 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4309 return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 4310 } else {
9e052883 4311 return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 4312 }
4313 } else {
9e052883 4314 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
2ff0b512 4315 }
4316}
4317#endif
4318
4319#if defined(DRFLAC_SUPPORT_NEON)
4320static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
4321{
4322 vst1q_s32(p+0, x.val[0]);
4323 vst1q_s32(p+4, x.val[1]);
4324}
4325
4326static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
4327{
4328 vst1q_u32(p+0, x.val[0]);
4329 vst1q_u32(p+4, x.val[1]);
4330}
4331
4332static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
4333{
4334 vst1q_f32(p+0, x.val[0]);
4335 vst1q_f32(p+4, x.val[1]);
4336}
4337
4338static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
4339{
4340 vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
4341}
4342
4343static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
4344{
4345 vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
4346}
4347
4348static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
4349{
4350 drflac_int32 x[4];
4351 x[3] = x3;
4352 x[2] = x2;
4353 x[1] = x1;
4354 x[0] = x0;
4355 return vld1q_s32(x);
4356}
4357
4358static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
4359{
4360 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4361
4362 /* Reference */
4363 /*return drflac__vdupq_n_s32x4(
4364 vgetq_lane_s32(a, 0),
4365 vgetq_lane_s32(b, 3),
4366 vgetq_lane_s32(b, 2),
4367 vgetq_lane_s32(b, 1)
4368 );*/
4369
4370 return vextq_s32(b, a, 1);
4371}
4372
4373static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
4374{
4375 /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
4376
4377 /* Reference */
4378 /*return drflac__vdupq_n_s32x4(
4379 vgetq_lane_s32(a, 0),
4380 vgetq_lane_s32(b, 3),
4381 vgetq_lane_s32(b, 2),
4382 vgetq_lane_s32(b, 1)
4383 );*/
4384
4385 return vextq_u32(b, a, 1);
4386}
4387
4388static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
4389{
4390 /* The sum must end up in position 0. */
4391
4392 /* Reference */
4393 /*return vdupq_n_s32(
4394 vgetq_lane_s32(x, 3) +
4395 vgetq_lane_s32(x, 2) +
4396 vgetq_lane_s32(x, 1) +
4397 vgetq_lane_s32(x, 0)
4398 );*/
4399
4400 int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
4401 return vpadd_s32(r, r);
4402}
4403
4404static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
4405{
4406 return vadd_s64(vget_high_s64(x), vget_low_s64(x));
4407}
4408
4409static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
4410{
4411 /* Reference */
4412 /*return drflac__vdupq_n_s32x4(
4413 vgetq_lane_s32(x, 0),
4414 vgetq_lane_s32(x, 1),
4415 vgetq_lane_s32(x, 2),
4416 vgetq_lane_s32(x, 3)
4417 );*/
4418
4419 return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
4420}
4421
4422static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
4423{
4424 return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
4425}
4426
4427static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
4428{
4429 return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
4430}
4431
4432static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4433{
4434 int i;
4435 drflac_uint32 riceParamMask;
4436 drflac_int32* pDecodedSamples = pSamplesOut;
4437 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4438 drflac_uint32 zeroCountParts[4];
4439 drflac_uint32 riceParamParts[4];
4440 int32x4_t coefficients128_0;
4441 int32x4_t coefficients128_4;
4442 int32x4_t coefficients128_8;
4443 int32x4_t samples128_0;
4444 int32x4_t samples128_4;
4445 int32x4_t samples128_8;
4446 uint32x4_t riceParamMask128;
4447 int32x4_t riceParam128;
4448 int32x2_t shift64;
4449 uint32x4_t one128;
4450
4451 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4452
9e052883 4453 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
2ff0b512 4454 riceParamMask128 = vdupq_n_u32(riceParamMask);
4455
4456 riceParam128 = vdupq_n_s32(riceParam);
4457 shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4458 one128 = vdupq_n_u32(1);
4459
4460 /*
4461 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
4462 what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
4463 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4464 so I think there's opportunity for this to be simplified.
4465 */
4466 {
4467 int runningOrder = order;
4468 drflac_int32 tempC[4] = {0, 0, 0, 0};
4469 drflac_int32 tempS[4] = {0, 0, 0, 0};
4470
4471 /* 0 - 3. */
4472 if (runningOrder >= 4) {
4473 coefficients128_0 = vld1q_s32(coefficients + 0);
4474 samples128_0 = vld1q_s32(pSamplesOut - 4);
4475 runningOrder -= 4;
4476 } else {
4477 switch (runningOrder) {
4478 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4479 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4480 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4481 }
4482
4483 coefficients128_0 = vld1q_s32(tempC);
4484 samples128_0 = vld1q_s32(tempS);
4485 runningOrder = 0;
4486 }
4487
4488 /* 4 - 7 */
4489 if (runningOrder >= 4) {
4490 coefficients128_4 = vld1q_s32(coefficients + 4);
4491 samples128_4 = vld1q_s32(pSamplesOut - 8);
4492 runningOrder -= 4;
4493 } else {
4494 switch (runningOrder) {
4495 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4496 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4497 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4498 }
4499
4500 coefficients128_4 = vld1q_s32(tempC);
4501 samples128_4 = vld1q_s32(tempS);
4502 runningOrder = 0;
4503 }
4504
4505 /* 8 - 11 */
4506 if (runningOrder == 4) {
4507 coefficients128_8 = vld1q_s32(coefficients + 8);
4508 samples128_8 = vld1q_s32(pSamplesOut - 12);
4509 runningOrder -= 4;
4510 } else {
4511 switch (runningOrder) {
4512 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4513 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4514 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4515 }
4516
4517 coefficients128_8 = vld1q_s32(tempC);
4518 samples128_8 = vld1q_s32(tempS);
4519 runningOrder = 0;
4520 }
4521
4522 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4523 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4524 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4525 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4526 }
4527
4528 /* For this version we are doing one sample at a time. */
4529 while (pDecodedSamples < pDecodedSamplesEnd) {
4530 int32x4_t prediction128;
4531 int32x2_t prediction64;
4532 uint32x4_t zeroCountPart128;
4533 uint32x4_t riceParamPart128;
4534
4535 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4536 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4537 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4538 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4539 return DRFLAC_FALSE;
4540 }
4541
4542 zeroCountPart128 = vld1q_u32(zeroCountParts);
4543 riceParamPart128 = vld1q_u32(riceParamParts);
4544
4545 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4546 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4547 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4548
4549 if (order <= 4) {
4550 for (i = 0; i < 4; i += 1) {
4551 prediction128 = vmulq_s32(coefficients128_0, samples128_0);
4552
4553 /* Horizontal add and shift. */
4554 prediction64 = drflac__vhaddq_s32(prediction128);
4555 prediction64 = vshl_s32(prediction64, shift64);
4556 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4557
4558 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4559 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4560 }
4561 } else if (order <= 8) {
4562 for (i = 0; i < 4; i += 1) {
4563 prediction128 = vmulq_s32(coefficients128_4, samples128_4);
4564 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4565
4566 /* Horizontal add and shift. */
4567 prediction64 = drflac__vhaddq_s32(prediction128);
4568 prediction64 = vshl_s32(prediction64, shift64);
4569 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4570
4571 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4572 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4573 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4574 }
4575 } else {
4576 for (i = 0; i < 4; i += 1) {
4577 prediction128 = vmulq_s32(coefficients128_8, samples128_8);
4578 prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
4579 prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
4580
4581 /* Horizontal add and shift. */
4582 prediction64 = drflac__vhaddq_s32(prediction128);
4583 prediction64 = vshl_s32(prediction64, shift64);
4584 prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
4585
4586 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4587 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4588 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
4589 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4590 }
4591 }
4592
4593 /* We store samples in groups of 4. */
4594 vst1q_s32(pDecodedSamples, samples128_0);
4595 pDecodedSamples += 4;
4596 }
4597
4598 /* Make sure we process the last few samples. */
4599 i = (count & ~3);
4600 while (i < (int)count) {
4601 /* Rice extraction. */
4602 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4603 return DRFLAC_FALSE;
4604 }
4605
4606 /* Rice reconstruction. */
4607 riceParamParts[0] &= riceParamMask;
4608 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4609 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4610
4611 /* Sample reconstruction. */
4612 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
4613
4614 i += 1;
4615 pDecodedSamples += 1;
4616 }
4617
4618 return DRFLAC_TRUE;
4619}
4620
4621static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
4622{
4623 int i;
4624 drflac_uint32 riceParamMask;
4625 drflac_int32* pDecodedSamples = pSamplesOut;
4626 drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
4627 drflac_uint32 zeroCountParts[4];
4628 drflac_uint32 riceParamParts[4];
4629 int32x4_t coefficients128_0;
4630 int32x4_t coefficients128_4;
4631 int32x4_t coefficients128_8;
4632 int32x4_t samples128_0;
4633 int32x4_t samples128_4;
4634 int32x4_t samples128_8;
4635 uint32x4_t riceParamMask128;
4636 int32x4_t riceParam128;
4637 int64x1_t shift64;
4638 uint32x4_t one128;
9e052883 4639 int64x2_t prediction128 = { 0 };
4640 uint32x4_t zeroCountPart128;
4641 uint32x4_t riceParamPart128;
2ff0b512 4642
4643 const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
4644
9e052883 4645 riceParamMask = (drflac_uint32)~((~0UL) << riceParam);
2ff0b512 4646 riceParamMask128 = vdupq_n_u32(riceParamMask);
4647
4648 riceParam128 = vdupq_n_s32(riceParam);
4649 shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
4650 one128 = vdupq_n_u32(1);
4651
4652 /*
4653 Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
9e052883 4654 what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
2ff0b512 4655 in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
4656 so I think there's opportunity for this to be simplified.
4657 */
4658 {
4659 int runningOrder = order;
4660 drflac_int32 tempC[4] = {0, 0, 0, 0};
4661 drflac_int32 tempS[4] = {0, 0, 0, 0};
4662
4663 /* 0 - 3. */
4664 if (runningOrder >= 4) {
4665 coefficients128_0 = vld1q_s32(coefficients + 0);
4666 samples128_0 = vld1q_s32(pSamplesOut - 4);
4667 runningOrder -= 4;
4668 } else {
4669 switch (runningOrder) {
4670 case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
4671 case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
4672 case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
4673 }
4674
4675 coefficients128_0 = vld1q_s32(tempC);
4676 samples128_0 = vld1q_s32(tempS);
4677 runningOrder = 0;
4678 }
4679
4680 /* 4 - 7 */
4681 if (runningOrder >= 4) {
4682 coefficients128_4 = vld1q_s32(coefficients + 4);
4683 samples128_4 = vld1q_s32(pSamplesOut - 8);
4684 runningOrder -= 4;
4685 } else {
4686 switch (runningOrder) {
4687 case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
4688 case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
4689 case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
4690 }
4691
4692 coefficients128_4 = vld1q_s32(tempC);
4693 samples128_4 = vld1q_s32(tempS);
4694 runningOrder = 0;
4695 }
4696
4697 /* 8 - 11 */
4698 if (runningOrder == 4) {
4699 coefficients128_8 = vld1q_s32(coefficients + 8);
4700 samples128_8 = vld1q_s32(pSamplesOut - 12);
4701 runningOrder -= 4;
4702 } else {
4703 switch (runningOrder) {
4704 case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
4705 case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
4706 case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
4707 }
4708
4709 coefficients128_8 = vld1q_s32(tempC);
4710 samples128_8 = vld1q_s32(tempS);
4711 runningOrder = 0;
4712 }
4713
4714 /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
4715 coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
4716 coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
4717 coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
4718 }
4719
4720 /* For this version we are doing one sample at a time. */
4721 while (pDecodedSamples < pDecodedSamplesEnd) {
2ff0b512 4722 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
4723 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
4724 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
4725 !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
4726 return DRFLAC_FALSE;
4727 }
4728
4729 zeroCountPart128 = vld1q_u32(zeroCountParts);
4730 riceParamPart128 = vld1q_u32(riceParamParts);
4731
4732 riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
4733 riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
4734 riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
4735
4736 for (i = 0; i < 4; i += 1) {
4737 int64x1_t prediction64;
4738
4739 prediction128 = veorq_s64(prediction128, prediction128); /* Reset to 0. */
4740 switch (order)
4741 {
4742 case 12:
4743 case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
4744 case 10:
4745 case 9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
4746 case 8:
4747 case 7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
4748 case 6:
4749 case 5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
4750 case 4:
4751 case 3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
4752 case 2:
4753 case 1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
4754 }
4755
4756 /* Horizontal add and shift. */
4757 prediction64 = drflac__vhaddq_s64(prediction128);
4758 prediction64 = vshl_s64(prediction64, shift64);
4759 prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
4760
4761 /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
4762 samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
4763 samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
4764 samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
4765
4766 /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
4767 riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
4768 }
4769
4770 /* We store samples in groups of 4. */
4771 vst1q_s32(pDecodedSamples, samples128_0);
4772 pDecodedSamples += 4;
4773 }
4774
4775 /* Make sure we process the last few samples. */
4776 i = (count & ~3);
4777 while (i < (int)count) {
4778 /* Rice extraction. */
4779 if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
4780 return DRFLAC_FALSE;
4781 }
4782
4783 /* Rice reconstruction. */
4784 riceParamParts[0] &= riceParamMask;
4785 riceParamParts[0] |= (zeroCountParts[0] << riceParam);
4786 riceParamParts[0] = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
4787
4788 /* Sample reconstruction. */
4789 pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
4790
4791 i += 1;
4792 pDecodedSamples += 1;
4793 }
4794
4795 return DRFLAC_TRUE;
4796}
4797
9e052883 4798static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 4799{
4800 DRFLAC_ASSERT(bs != NULL);
2ff0b512 4801 DRFLAC_ASSERT(pSamplesOut != NULL);
4802
4803 /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
9e052883 4804 if (lpcOrder > 0 && lpcOrder <= 12) {
4805 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4806 return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 4807 } else {
9e052883 4808 return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
2ff0b512 4809 }
4810 } else {
9e052883 4811 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
2ff0b512 4812 }
4813}
4814#endif
4815
9e052883 4816static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 4817{
4818#if defined(DRFLAC_SUPPORT_SSE41)
4819 if (drflac__gIsSSE41Supported) {
9e052883 4820 return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
2ff0b512 4821 } else
4822#elif defined(DRFLAC_SUPPORT_NEON)
4823 if (drflac__gIsNEONSupported) {
9e052883 4824 return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
2ff0b512 4825 } else
4826#endif
4827 {
4828 /* Scalar fallback. */
9e052883 4829 #if 0
4830 return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4831 #else
4832 return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
4833 #endif
2ff0b512 4834 }
4835}
4836
4837/* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
4838static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
4839{
4840 drflac_uint32 i;
4841
4842 DRFLAC_ASSERT(bs != NULL);
2ff0b512 4843
4844 for (i = 0; i < count; ++i) {
4845 if (!drflac__seek_rice_parts(bs, riceParam)) {
4846 return DRFLAC_FALSE;
4847 }
4848 }
4849
4850 return DRFLAC_TRUE;
4851}
4852
9e052883 4853#if defined(__clang__)
4854__attribute__((no_sanitize("signed-integer-overflow")))
4855#endif
4856static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
2ff0b512 4857{
4858 drflac_uint32 i;
4859
4860 DRFLAC_ASSERT(bs != NULL);
2ff0b512 4861 DRFLAC_ASSERT(unencodedBitsPerSample <= 31); /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
4862 DRFLAC_ASSERT(pSamplesOut != NULL);
4863
4864 for (i = 0; i < count; ++i) {
4865 if (unencodedBitsPerSample > 0) {
4866 if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
4867 return DRFLAC_FALSE;
4868 }
4869 } else {
4870 pSamplesOut[i] = 0;
4871 }
4872
9e052883 4873 if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
4874 pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
2ff0b512 4875 } else {
9e052883 4876 pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
2ff0b512 4877 }
4878 }
4879
4880 return DRFLAC_TRUE;
4881}
4882
4883
4884/*
4885Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
4886when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
4887<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4888*/
9e052883 4889static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
2ff0b512 4890{
4891 drflac_uint8 residualMethod;
4892 drflac_uint8 partitionOrder;
4893 drflac_uint32 samplesInPartition;
4894 drflac_uint32 partitionsRemaining;
4895
4896 DRFLAC_ASSERT(bs != NULL);
4897 DRFLAC_ASSERT(blockSize != 0);
4898 DRFLAC_ASSERT(pDecodedSamples != NULL); /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
4899
4900 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4901 return DRFLAC_FALSE;
4902 }
4903
4904 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4905 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
4906 }
4907
4908 /* Ignore the first <order> values. */
9e052883 4909 pDecodedSamples += lpcOrder;
2ff0b512 4910
4911 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
4912 return DRFLAC_FALSE;
4913 }
4914
4915 /*
4916 From the FLAC spec:
4917 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
4918 */
4919 if (partitionOrder > 8) {
4920 return DRFLAC_FALSE;
4921 }
4922
4923 /* Validation check. */
9e052883 4924 if ((blockSize / (1 << partitionOrder)) < lpcOrder) {
2ff0b512 4925 return DRFLAC_FALSE;
4926 }
4927
9e052883 4928 samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder;
2ff0b512 4929 partitionsRemaining = (1 << partitionOrder);
4930 for (;;) {
4931 drflac_uint8 riceParam = 0;
4932 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
4933 if (!drflac__read_uint8(bs, 4, &riceParam)) {
4934 return DRFLAC_FALSE;
4935 }
4936 if (riceParam == 15) {
4937 riceParam = 0xFF;
4938 }
4939 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4940 if (!drflac__read_uint8(bs, 5, &riceParam)) {
4941 return DRFLAC_FALSE;
4942 }
4943 if (riceParam == 31) {
4944 riceParam = 0xFF;
4945 }
4946 }
4947
4948 if (riceParam != 0xFF) {
9e052883 4949 if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
2ff0b512 4950 return DRFLAC_FALSE;
4951 }
4952 } else {
4953 drflac_uint8 unencodedBitsPerSample = 0;
4954 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
4955 return DRFLAC_FALSE;
4956 }
4957
9e052883 4958 if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
2ff0b512 4959 return DRFLAC_FALSE;
4960 }
4961 }
4962
4963 pDecodedSamples += samplesInPartition;
4964
4965 if (partitionsRemaining == 1) {
4966 break;
4967 }
4968
4969 partitionsRemaining -= 1;
4970
4971 if (partitionOrder != 0) {
4972 samplesInPartition = blockSize / (1 << partitionOrder);
4973 }
4974 }
4975
4976 return DRFLAC_TRUE;
4977}
4978
4979/*
4980Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
4981when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
4982<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
4983*/
4984static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
4985{
4986 drflac_uint8 residualMethod;
4987 drflac_uint8 partitionOrder;
4988 drflac_uint32 samplesInPartition;
4989 drflac_uint32 partitionsRemaining;
4990
4991 DRFLAC_ASSERT(bs != NULL);
4992 DRFLAC_ASSERT(blockSize != 0);
4993
4994 if (!drflac__read_uint8(bs, 2, &residualMethod)) {
4995 return DRFLAC_FALSE;
4996 }
4997
4998 if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
4999 return DRFLAC_FALSE; /* Unknown or unsupported residual coding method. */
5000 }
5001
5002 if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
5003 return DRFLAC_FALSE;
5004 }
5005
5006 /*
5007 From the FLAC spec:
5008 The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
5009 */
5010 if (partitionOrder > 8) {
5011 return DRFLAC_FALSE;
5012 }
5013
5014 /* Validation check. */
5015 if ((blockSize / (1 << partitionOrder)) <= order) {
5016 return DRFLAC_FALSE;
5017 }
5018
5019 samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
5020 partitionsRemaining = (1 << partitionOrder);
5021 for (;;)
5022 {
5023 drflac_uint8 riceParam = 0;
5024 if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
5025 if (!drflac__read_uint8(bs, 4, &riceParam)) {
5026 return DRFLAC_FALSE;
5027 }
5028 if (riceParam == 15) {
5029 riceParam = 0xFF;
5030 }
5031 } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
5032 if (!drflac__read_uint8(bs, 5, &riceParam)) {
5033 return DRFLAC_FALSE;
5034 }
5035 if (riceParam == 31) {
5036 riceParam = 0xFF;
5037 }
5038 }
5039
5040 if (riceParam != 0xFF) {
5041 if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
5042 return DRFLAC_FALSE;
5043 }
5044 } else {
5045 drflac_uint8 unencodedBitsPerSample = 0;
5046 if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
5047 return DRFLAC_FALSE;
5048 }
5049
5050 if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
5051 return DRFLAC_FALSE;
5052 }
5053 }
5054
5055
5056 if (partitionsRemaining == 1) {
5057 break;
5058 }
5059
5060 partitionsRemaining -= 1;
5061 samplesInPartition = blockSize / (1 << partitionOrder);
5062 }
5063
5064 return DRFLAC_TRUE;
5065}
5066
5067
5068static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5069{
5070 drflac_uint32 i;
5071
5072 /* Only a single sample needs to be decoded here. */
5073 drflac_int32 sample;
5074 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5075 return DRFLAC_FALSE;
5076 }
5077
5078 /*
5079 We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
5080 we'll want to look at a more efficient way.
5081 */
5082 for (i = 0; i < blockSize; ++i) {
5083 pDecodedSamples[i] = sample;
5084 }
5085
5086 return DRFLAC_TRUE;
5087}
5088
5089static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
5090{
5091 drflac_uint32 i;
5092
5093 for (i = 0; i < blockSize; ++i) {
5094 drflac_int32 sample;
5095 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5096 return DRFLAC_FALSE;
5097 }
5098
5099 pDecodedSamples[i] = sample;
5100 }
5101
5102 return DRFLAC_TRUE;
5103}
5104
5105static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5106{
5107 drflac_uint32 i;
5108
5109 static drflac_int32 lpcCoefficientsTable[5][4] = {
5110 {0, 0, 0, 0},
5111 {1, 0, 0, 0},
5112 {2, -1, 0, 0},
5113 {3, -3, 1, 0},
5114 {4, -6, 4, -1}
5115 };
5116
5117 /* Warm up samples and coefficients. */
5118 for (i = 0; i < lpcOrder; ++i) {
5119 drflac_int32 sample;
5120 if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
5121 return DRFLAC_FALSE;
5122 }
5123
5124 pDecodedSamples[i] = sample;
5125 }
5126
9e052883 5127 if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
2ff0b512 5128 return DRFLAC_FALSE;
5129 }
5130
5131 return DRFLAC_TRUE;
5132}
5133
5134static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
5135{
5136 drflac_uint8 i;
5137 drflac_uint8 lpcPrecision;
5138 drflac_int8 lpcShift;
5139 drflac_int32 coefficients[32];
5140
5141 /* Warm up samples. */
5142 for (i = 0; i < lpcOrder; ++i) {
5143 drflac_int32 sample;
5144 if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
5145 return DRFLAC_FALSE;
5146 }
5147
5148 pDecodedSamples[i] = sample;
5149 }
5150
5151 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5152 return DRFLAC_FALSE;
5153 }
5154 if (lpcPrecision == 15) {
5155 return DRFLAC_FALSE; /* Invalid. */
5156 }
5157 lpcPrecision += 1;
5158
5159 if (!drflac__read_int8(bs, 5, &lpcShift)) {
5160 return DRFLAC_FALSE;
5161 }
5162
5163 /*
5164 From the FLAC specification:
5165
5166 Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
5167
5168 Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
5169 not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
5170 */
5171 if (lpcShift < 0) {
5172 return DRFLAC_FALSE;
5173 }
5174
5175 DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
5176 for (i = 0; i < lpcOrder; ++i) {
5177 if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
5178 return DRFLAC_FALSE;
5179 }
5180 }
5181
9e052883 5182 if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
2ff0b512 5183 return DRFLAC_FALSE;
5184 }
5185
5186 return DRFLAC_TRUE;
5187}
5188
5189
5190static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
5191{
5192 const drflac_uint32 sampleRateTable[12] = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
5193 const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1}; /* -1 = reserved. */
5194
5195 DRFLAC_ASSERT(bs != NULL);
5196 DRFLAC_ASSERT(header != NULL);
5197
5198 /* Keep looping until we find a valid sync code. */
5199 for (;;) {
5200 drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
5201 drflac_uint8 reserved = 0;
5202 drflac_uint8 blockingStrategy = 0;
5203 drflac_uint8 blockSize = 0;
5204 drflac_uint8 sampleRate = 0;
5205 drflac_uint8 channelAssignment = 0;
5206 drflac_uint8 bitsPerSample = 0;
5207 drflac_bool32 isVariableBlockSize;
5208
5209 if (!drflac__find_and_seek_to_next_sync_code(bs)) {
5210 return DRFLAC_FALSE;
5211 }
5212
5213 if (!drflac__read_uint8(bs, 1, &reserved)) {
5214 return DRFLAC_FALSE;
5215 }
5216 if (reserved == 1) {
5217 continue;
5218 }
5219 crc8 = drflac_crc8(crc8, reserved, 1);
5220
5221 if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
5222 return DRFLAC_FALSE;
5223 }
5224 crc8 = drflac_crc8(crc8, blockingStrategy, 1);
5225
5226 if (!drflac__read_uint8(bs, 4, &blockSize)) {
5227 return DRFLAC_FALSE;
5228 }
5229 if (blockSize == 0) {
5230 continue;
5231 }
5232 crc8 = drflac_crc8(crc8, blockSize, 4);
5233
5234 if (!drflac__read_uint8(bs, 4, &sampleRate)) {
5235 return DRFLAC_FALSE;
5236 }
5237 crc8 = drflac_crc8(crc8, sampleRate, 4);
5238
5239 if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
5240 return DRFLAC_FALSE;
5241 }
5242 if (channelAssignment > 10) {
5243 continue;
5244 }
5245 crc8 = drflac_crc8(crc8, channelAssignment, 4);
5246
5247 if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
5248 return DRFLAC_FALSE;
5249 }
5250 if (bitsPerSample == 3 || bitsPerSample == 7) {
5251 continue;
5252 }
5253 crc8 = drflac_crc8(crc8, bitsPerSample, 3);
5254
5255
5256 if (!drflac__read_uint8(bs, 1, &reserved)) {
5257 return DRFLAC_FALSE;
5258 }
5259 if (reserved == 1) {
5260 continue;
5261 }
5262 crc8 = drflac_crc8(crc8, reserved, 1);
5263
5264
5265 isVariableBlockSize = blockingStrategy == 1;
5266 if (isVariableBlockSize) {
5267 drflac_uint64 pcmFrameNumber;
5268 drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
5269 if (result != DRFLAC_SUCCESS) {
5270 if (result == DRFLAC_AT_END) {
5271 return DRFLAC_FALSE;
5272 } else {
5273 continue;
5274 }
5275 }
5276 header->flacFrameNumber = 0;
5277 header->pcmFrameNumber = pcmFrameNumber;
5278 } else {
5279 drflac_uint64 flacFrameNumber = 0;
5280 drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
5281 if (result != DRFLAC_SUCCESS) {
5282 if (result == DRFLAC_AT_END) {
5283 return DRFLAC_FALSE;
5284 } else {
5285 continue;
5286 }
5287 }
5288 header->flacFrameNumber = (drflac_uint32)flacFrameNumber; /* <-- Safe cast. */
5289 header->pcmFrameNumber = 0;
5290 }
5291
5292
5293 DRFLAC_ASSERT(blockSize > 0);
5294 if (blockSize == 1) {
5295 header->blockSizeInPCMFrames = 192;
5296 } else if (blockSize <= 5) {
5297 DRFLAC_ASSERT(blockSize >= 2);
5298 header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
5299 } else if (blockSize == 6) {
5300 if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
5301 return DRFLAC_FALSE;
5302 }
5303 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
5304 header->blockSizeInPCMFrames += 1;
5305 } else if (blockSize == 7) {
5306 if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
5307 return DRFLAC_FALSE;
5308 }
5309 crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
9e052883 5310 if (header->blockSizeInPCMFrames == 0xFFFF) {
5311 return DRFLAC_FALSE; /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */
5312 }
2ff0b512 5313 header->blockSizeInPCMFrames += 1;
5314 } else {
5315 DRFLAC_ASSERT(blockSize >= 8);
5316 header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
5317 }
5318
5319
5320 if (sampleRate <= 11) {
5321 header->sampleRate = sampleRateTable[sampleRate];
5322 } else if (sampleRate == 12) {
5323 if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
5324 return DRFLAC_FALSE;
5325 }
5326 crc8 = drflac_crc8(crc8, header->sampleRate, 8);
5327 header->sampleRate *= 1000;
5328 } else if (sampleRate == 13) {
5329 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5330 return DRFLAC_FALSE;
5331 }
5332 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5333 } else if (sampleRate == 14) {
5334 if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
5335 return DRFLAC_FALSE;
5336 }
5337 crc8 = drflac_crc8(crc8, header->sampleRate, 16);
5338 header->sampleRate *= 10;
5339 } else {
5340 continue; /* Invalid. Assume an invalid block. */
5341 }
5342
5343
5344 header->channelAssignment = channelAssignment;
5345
5346 header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
5347 if (header->bitsPerSample == 0) {
5348 header->bitsPerSample = streaminfoBitsPerSample;
5349 }
5350
9e052883 5351 if (header->bitsPerSample != streaminfoBitsPerSample) {
5352 /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */
5353 return DRFLAC_FALSE;
5354 }
5355
2ff0b512 5356 if (!drflac__read_uint8(bs, 8, &header->crc8)) {
5357 return DRFLAC_FALSE;
5358 }
5359
5360#ifndef DR_FLAC_NO_CRC
5361 if (header->crc8 != crc8) {
5362 continue; /* CRC mismatch. Loop back to the top and find the next sync code. */
5363 }
5364#endif
5365 return DRFLAC_TRUE;
5366 }
5367}
5368
5369static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
5370{
5371 drflac_uint8 header;
5372 int type;
5373
5374 if (!drflac__read_uint8(bs, 8, &header)) {
5375 return DRFLAC_FALSE;
5376 }
5377
5378 /* First bit should always be 0. */
5379 if ((header & 0x80) != 0) {
5380 return DRFLAC_FALSE;
5381 }
5382
5383 type = (header & 0x7E) >> 1;
5384 if (type == 0) {
5385 pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
5386 } else if (type == 1) {
5387 pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
5388 } else {
5389 if ((type & 0x20) != 0) {
5390 pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
5391 pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
5392 } else if ((type & 0x08) != 0) {
5393 pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
5394 pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
5395 if (pSubframe->lpcOrder > 4) {
5396 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5397 pSubframe->lpcOrder = 0;
5398 }
5399 } else {
5400 pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
5401 }
5402 }
5403
5404 if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
5405 return DRFLAC_FALSE;
5406 }
5407
5408 /* Wasted bits per sample. */
5409 pSubframe->wastedBitsPerSample = 0;
5410 if ((header & 0x01) == 1) {
5411 unsigned int wastedBitsPerSample;
5412 if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
5413 return DRFLAC_FALSE;
5414 }
5415 pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
5416 }
5417
5418 return DRFLAC_TRUE;
5419}
5420
5421static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
5422{
5423 drflac_subframe* pSubframe;
5424 drflac_uint32 subframeBitsPerSample;
5425
5426 DRFLAC_ASSERT(bs != NULL);
5427 DRFLAC_ASSERT(frame != NULL);
5428
5429 pSubframe = frame->subframes + subframeIndex;
5430 if (!drflac__read_subframe_header(bs, pSubframe)) {
5431 return DRFLAC_FALSE;
5432 }
5433
5434 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5435 subframeBitsPerSample = frame->header.bitsPerSample;
5436 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5437 subframeBitsPerSample += 1;
5438 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5439 subframeBitsPerSample += 1;
5440 }
5441
9e052883 5442 if (subframeBitsPerSample > 32) {
5443 /* libFLAC and ffmpeg reject 33-bit subframes as well */
5444 return DRFLAC_FALSE;
5445 }
5446
2ff0b512 5447 /* Need to handle wasted bits per sample. */
5448 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5449 return DRFLAC_FALSE;
5450 }
5451 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5452
5453 pSubframe->pSamplesS32 = pDecodedSamplesOut;
5454
5455 switch (pSubframe->subframeType)
5456 {
5457 case DRFLAC_SUBFRAME_CONSTANT:
5458 {
5459 drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5460 } break;
5461
5462 case DRFLAC_SUBFRAME_VERBATIM:
5463 {
5464 drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
5465 } break;
5466
5467 case DRFLAC_SUBFRAME_FIXED:
5468 {
5469 drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5470 } break;
5471
5472 case DRFLAC_SUBFRAME_LPC:
5473 {
5474 drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
5475 } break;
5476
5477 default: return DRFLAC_FALSE;
5478 }
5479
5480 return DRFLAC_TRUE;
5481}
5482
5483static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
5484{
5485 drflac_subframe* pSubframe;
5486 drflac_uint32 subframeBitsPerSample;
5487
5488 DRFLAC_ASSERT(bs != NULL);
5489 DRFLAC_ASSERT(frame != NULL);
5490
5491 pSubframe = frame->subframes + subframeIndex;
5492 if (!drflac__read_subframe_header(bs, pSubframe)) {
5493 return DRFLAC_FALSE;
5494 }
5495
5496 /* Side channels require an extra bit per sample. Took a while to figure that one out... */
5497 subframeBitsPerSample = frame->header.bitsPerSample;
5498 if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
5499 subframeBitsPerSample += 1;
5500 } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
5501 subframeBitsPerSample += 1;
5502 }
5503
5504 /* Need to handle wasted bits per sample. */
5505 if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
5506 return DRFLAC_FALSE;
5507 }
5508 subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
5509
5510 pSubframe->pSamplesS32 = NULL;
5511
5512 switch (pSubframe->subframeType)
5513 {
5514 case DRFLAC_SUBFRAME_CONSTANT:
5515 {
5516 if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
5517 return DRFLAC_FALSE;
5518 }
5519 } break;
5520
5521 case DRFLAC_SUBFRAME_VERBATIM:
5522 {
5523 unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
5524 if (!drflac__seek_bits(bs, bitsToSeek)) {
5525 return DRFLAC_FALSE;
5526 }
5527 } break;
5528
5529 case DRFLAC_SUBFRAME_FIXED:
5530 {
5531 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5532 if (!drflac__seek_bits(bs, bitsToSeek)) {
5533 return DRFLAC_FALSE;
5534 }
5535
5536 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5537 return DRFLAC_FALSE;
5538 }
5539 } break;
5540
5541 case DRFLAC_SUBFRAME_LPC:
5542 {
5543 drflac_uint8 lpcPrecision;
5544
5545 unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
5546 if (!drflac__seek_bits(bs, bitsToSeek)) {
5547 return DRFLAC_FALSE;
5548 }
5549
5550 if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
5551 return DRFLAC_FALSE;
5552 }
5553 if (lpcPrecision == 15) {
5554 return DRFLAC_FALSE; /* Invalid. */
5555 }
5556 lpcPrecision += 1;
5557
5558
5559 bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5; /* +5 for shift. */
5560 if (!drflac__seek_bits(bs, bitsToSeek)) {
5561 return DRFLAC_FALSE;
5562 }
5563
5564 if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
5565 return DRFLAC_FALSE;
5566 }
5567 } break;
5568
5569 default: return DRFLAC_FALSE;
5570 }
5571
5572 return DRFLAC_TRUE;
5573}
5574
5575
5576static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
5577{
5578 drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
5579
5580 DRFLAC_ASSERT(channelAssignment <= 10);
5581 return lookup[channelAssignment];
5582}
5583
5584static drflac_result drflac__decode_flac_frame(drflac* pFlac)
5585{
5586 int channelCount;
5587 int i;
5588 drflac_uint8 paddingSizeInBits;
5589 drflac_uint16 desiredCRC16;
5590#ifndef DR_FLAC_NO_CRC
5591 drflac_uint16 actualCRC16;
5592#endif
5593
5594 /* This function should be called while the stream is sitting on the first byte after the frame header. */
5595 DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
5596
5597 /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
5598 if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
5599 return DRFLAC_ERROR;
5600 }
5601
5602 /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
5603 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5604 if (channelCount != (int)pFlac->channels) {
5605 return DRFLAC_ERROR;
5606 }
5607
5608 for (i = 0; i < channelCount; ++i) {
5609 if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
5610 return DRFLAC_ERROR;
5611 }
5612 }
5613
5614 paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
5615 if (paddingSizeInBits > 0) {
5616 drflac_uint8 padding = 0;
5617 if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
5618 return DRFLAC_AT_END;
5619 }
5620 }
5621
5622#ifndef DR_FLAC_NO_CRC
5623 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5624#endif
5625 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5626 return DRFLAC_AT_END;
5627 }
5628
5629#ifndef DR_FLAC_NO_CRC
5630 if (actualCRC16 != desiredCRC16) {
5631 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5632 }
5633#endif
5634
5635 pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5636
5637 return DRFLAC_SUCCESS;
5638}
5639
5640static drflac_result drflac__seek_flac_frame(drflac* pFlac)
5641{
5642 int channelCount;
5643 int i;
5644 drflac_uint16 desiredCRC16;
5645#ifndef DR_FLAC_NO_CRC
5646 drflac_uint16 actualCRC16;
5647#endif
5648
5649 channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
5650 for (i = 0; i < channelCount; ++i) {
5651 if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
5652 return DRFLAC_ERROR;
5653 }
5654 }
5655
5656 /* Padding. */
5657 if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
5658 return DRFLAC_ERROR;
5659 }
5660
5661 /* CRC. */
5662#ifndef DR_FLAC_NO_CRC
5663 actualCRC16 = drflac__flush_crc16(&pFlac->bs);
5664#endif
5665 if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
5666 return DRFLAC_AT_END;
5667 }
5668
5669#ifndef DR_FLAC_NO_CRC
5670 if (actualCRC16 != desiredCRC16) {
5671 return DRFLAC_CRC_MISMATCH; /* CRC mismatch. */
5672 }
5673#endif
5674
5675 return DRFLAC_SUCCESS;
5676}
5677
5678static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
5679{
5680 DRFLAC_ASSERT(pFlac != NULL);
5681
5682 for (;;) {
5683 drflac_result result;
5684
5685 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5686 return DRFLAC_FALSE;
5687 }
5688
5689 result = drflac__decode_flac_frame(pFlac);
5690 if (result != DRFLAC_SUCCESS) {
5691 if (result == DRFLAC_CRC_MISMATCH) {
5692 continue; /* CRC mismatch. Skip to the next frame. */
5693 } else {
5694 return DRFLAC_FALSE;
5695 }
5696 }
5697
5698 return DRFLAC_TRUE;
5699 }
5700}
5701
5702static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
5703{
5704 drflac_uint64 firstPCMFrame;
5705 drflac_uint64 lastPCMFrame;
5706
5707 DRFLAC_ASSERT(pFlac != NULL);
5708
5709 firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
5710 if (firstPCMFrame == 0) {
5711 firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
5712 }
5713
5714 lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
5715 if (lastPCMFrame > 0) {
5716 lastPCMFrame -= 1; /* Needs to be zero based. */
5717 }
5718
5719 if (pFirstPCMFrame) {
5720 *pFirstPCMFrame = firstPCMFrame;
5721 }
5722 if (pLastPCMFrame) {
5723 *pLastPCMFrame = lastPCMFrame;
5724 }
5725}
5726
5727static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
5728{
5729 drflac_bool32 result;
5730
5731 DRFLAC_ASSERT(pFlac != NULL);
5732
5733 result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
5734
5735 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5736 pFlac->currentPCMFrame = 0;
5737
5738 return result;
5739}
5740
5741static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
5742{
5743 /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
5744 DRFLAC_ASSERT(pFlac != NULL);
5745 return drflac__seek_flac_frame(pFlac);
5746}
5747
5748
5749static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
5750{
5751 drflac_uint64 pcmFramesRead = 0;
5752 while (pcmFramesToSeek > 0) {
5753 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5754 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5755 break; /* Couldn't read the next frame, so just break from the loop and return. */
5756 }
5757 } else {
5758 if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
5759 pcmFramesRead += pcmFramesToSeek;
5760 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek; /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
5761 pcmFramesToSeek = 0;
5762 } else {
5763 pcmFramesRead += pFlac->currentFLACFrame.pcmFramesRemaining;
5764 pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
5765 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5766 }
5767 }
5768 }
5769
5770 pFlac->currentPCMFrame += pcmFramesRead;
5771 return pcmFramesRead;
5772}
5773
5774
5775static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
5776{
5777 drflac_bool32 isMidFrame = DRFLAC_FALSE;
5778 drflac_uint64 runningPCMFrameCount;
5779
5780 DRFLAC_ASSERT(pFlac != NULL);
5781
5782 /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
5783 if (pcmFrameIndex >= pFlac->currentPCMFrame) {
5784 /* Seeking forward. Need to seek from the current position. */
5785 runningPCMFrameCount = pFlac->currentPCMFrame;
5786
5787 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
5788 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
5789 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5790 return DRFLAC_FALSE;
5791 }
5792 } else {
5793 isMidFrame = DRFLAC_TRUE;
5794 }
5795 } else {
5796 /* Seeking backwards. Need to seek from the start of the file. */
5797 runningPCMFrameCount = 0;
5798
5799 /* Move back to the start. */
5800 if (!drflac__seek_to_first_frame(pFlac)) {
5801 return DRFLAC_FALSE;
5802 }
5803
5804 /* Decode the first frame in preparation for sample-exact seeking below. */
5805 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5806 return DRFLAC_FALSE;
5807 }
5808 }
5809
5810 /*
5811 We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
5812 header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
5813 */
5814 for (;;) {
5815 drflac_uint64 pcmFrameCountInThisFLACFrame;
5816 drflac_uint64 firstPCMFrameInFLACFrame = 0;
5817 drflac_uint64 lastPCMFrameInFLACFrame = 0;
5818
5819 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
5820
5821 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
5822 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
5823 /*
5824 The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
5825 it never existed and keep iterating.
5826 */
5827 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
5828
5829 if (!isMidFrame) {
5830 drflac_result result = drflac__decode_flac_frame(pFlac);
5831 if (result == DRFLAC_SUCCESS) {
5832 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
5833 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
5834 } else {
5835 if (result == DRFLAC_CRC_MISMATCH) {
5836 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5837 } else {
5838 return DRFLAC_FALSE;
5839 }
5840 }
5841 } else {
5842 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
5843 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
5844 }
5845 } else {
5846 /*
5847 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
5848 frame never existed and leave the running sample count untouched.
5849 */
5850 if (!isMidFrame) {
5851 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
5852 if (result == DRFLAC_SUCCESS) {
5853 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
5854 } else {
5855 if (result == DRFLAC_CRC_MISMATCH) {
5856 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
5857 } else {
5858 return DRFLAC_FALSE;
5859 }
5860 }
5861 } else {
5862 /*
5863 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
5864 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
5865 */
5866 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
5867 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
5868 isMidFrame = DRFLAC_FALSE;
5869 }
5870
5871 /* If we are seeking to the end of the file and we've just hit it, we're done. */
5872 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
5873 return DRFLAC_TRUE;
5874 }
5875 }
5876
5877 next_iteration:
5878 /* Grab the next frame in preparation for the next iteration. */
5879 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5880 return DRFLAC_FALSE;
5881 }
5882 }
5883}
5884
5885
5886#if !defined(DR_FLAC_NO_CRC)
5887/*
5888We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
5889uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
5890location.
5891*/
5892#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
5893
5894static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
5895{
5896 DRFLAC_ASSERT(pFlac != NULL);
5897 DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
5898 DRFLAC_ASSERT(targetByte >= rangeLo);
5899 DRFLAC_ASSERT(targetByte <= rangeHi);
5900
5901 *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
5902
5903 for (;;) {
5904 /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
5905 drflac_uint64 lastTargetByte = targetByte;
5906
5907 /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
5908 if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
5909 /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
5910 if (targetByte == 0) {
5911 drflac__seek_to_first_frame(pFlac); /* Try to recover. */
5912 return DRFLAC_FALSE;
5913 }
5914
5915 /* Halve the byte location and continue. */
5916 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5917 rangeHi = targetByte;
5918 } else {
5919 /* Getting here should mean that we have seeked to an appropriate byte. */
5920
5921 /* Clear the details of the FLAC frame so we don't misreport data. */
5922 DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
5923
5924 /*
5925 Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
5926 CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
5927 so it needs to stay this way for now.
5928 */
5929#if 1
5930 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
5931 /* Halve the byte location and continue. */
5932 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5933 rangeHi = targetByte;
5934 } else {
5935 break;
5936 }
5937#else
5938 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
5939 /* Halve the byte location and continue. */
5940 targetByte = rangeLo + ((rangeHi - rangeLo)/2);
5941 rangeHi = targetByte;
5942 } else {
5943 break;
5944 }
5945#endif
5946 }
5947
5948 /* We already tried this byte and there are no more to try, break out. */
5949 if(targetByte == lastTargetByte) {
5950 return DRFLAC_FALSE;
5951 }
5952 }
5953
5954 /* The current PCM frame needs to be updated based on the frame we just seeked to. */
5955 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
5956
5957 DRFLAC_ASSERT(targetByte <= rangeHi);
5958
5959 *pLastSuccessfulSeekOffset = targetByte;
5960 return DRFLAC_TRUE;
5961}
5962
5963static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
5964{
9e052883 5965 /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
5966#if 0
5967 if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
5968 /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
5969 if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
5970 return DRFLAC_FALSE;
5971 }
5972 }
5973#endif
5974
2ff0b512 5975 return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
5976}
5977
5978
5979static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
5980{
5981 /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
5982
5983 drflac_uint64 targetByte;
5984 drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
5985 drflac_uint64 pcmRangeHi = 0;
5986 drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
5987 drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
5988 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
5989
5990 targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
5991 if (targetByte > byteRangeHi) {
5992 targetByte = byteRangeHi;
5993 }
5994
5995 for (;;) {
5996 if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
5997 /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
5998 drflac_uint64 newPCMRangeLo;
5999 drflac_uint64 newPCMRangeHi;
6000 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
6001
6002 /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
6003 if (pcmRangeLo == newPCMRangeLo) {
6004 if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
6005 break; /* Failed to seek to closest frame. */
6006 }
6007
6008 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
6009 return DRFLAC_TRUE;
6010 } else {
6011 break; /* Failed to seek forward. */
6012 }
6013 }
6014
6015 pcmRangeLo = newPCMRangeLo;
6016 pcmRangeHi = newPCMRangeHi;
6017
6018 if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
6019 /* The target PCM frame is in this FLAC frame. */
6020 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
6021 return DRFLAC_TRUE;
6022 } else {
6023 break; /* Failed to seek to FLAC frame. */
6024 }
6025 } else {
6026 const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6027
6028 if (pcmRangeLo > pcmFrameIndex) {
6029 /* We seeked too far forward. We need to move our target byte backward and try again. */
6030 byteRangeHi = lastSuccessfulSeekOffset;
6031 if (byteRangeLo > byteRangeHi) {
6032 byteRangeLo = byteRangeHi;
6033 }
6034
6035 targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
6036 if (targetByte < byteRangeLo) {
6037 targetByte = byteRangeLo;
6038 }
6039 } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
6040 /* We didn't seek far enough. We need to move our target byte forward and try again. */
6041
6042 /* If we're close enough we can just seek forward. */
6043 if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
6044 if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
6045 return DRFLAC_TRUE;
6046 } else {
6047 break; /* Failed to seek to FLAC frame. */
6048 }
6049 } else {
6050 byteRangeLo = lastSuccessfulSeekOffset;
6051 if (byteRangeHi < byteRangeLo) {
6052 byteRangeHi = byteRangeLo;
6053 }
6054
6055 targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
6056 if (targetByte > byteRangeHi) {
6057 targetByte = byteRangeHi;
6058 }
6059
6060 if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
6061 closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
6062 }
6063 }
6064 }
6065 }
6066 } else {
6067 /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
6068 break;
6069 }
6070 }
6071
6072 drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
6073 return DRFLAC_FALSE;
6074}
6075
6076static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6077{
6078 drflac_uint64 byteRangeLo;
6079 drflac_uint64 byteRangeHi;
6080 drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
6081
6082 /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
6083 if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
6084 return DRFLAC_FALSE;
6085 }
6086
6087 /* If we're close enough to the start, just move to the start and seek forward. */
6088 if (pcmFrameIndex < seekForwardThreshold) {
6089 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
6090 }
6091
6092 /*
6093 Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
6094 the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
6095 */
6096 byteRangeLo = pFlac->firstFLACFramePosInBytes;
6097 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6098
6099 return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
6100}
6101#endif /* !DR_FLAC_NO_CRC */
6102
6103static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
6104{
6105 drflac_uint32 iClosestSeekpoint = 0;
6106 drflac_bool32 isMidFrame = DRFLAC_FALSE;
6107 drflac_uint64 runningPCMFrameCount;
6108 drflac_uint32 iSeekpoint;
6109
6110
6111 DRFLAC_ASSERT(pFlac != NULL);
6112
6113 if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
6114 return DRFLAC_FALSE;
6115 }
6116
9e052883 6117 /* Do not use the seektable if pcmFramIndex is not coverd by it. */
6118 if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
6119 return DRFLAC_FALSE;
6120 }
6121
2ff0b512 6122 for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
6123 if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
6124 break;
6125 }
6126
6127 iClosestSeekpoint = iSeekpoint;
6128 }
6129
6130 /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
6131 if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
6132 return DRFLAC_FALSE;
6133 }
6134 if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
6135 return DRFLAC_FALSE;
6136 }
6137
6138#if !defined(DR_FLAC_NO_CRC)
6139 /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
6140 if (pFlac->totalPCMFrameCount > 0) {
6141 drflac_uint64 byteRangeLo;
6142 drflac_uint64 byteRangeHi;
6143
6144 byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
6145 byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
6146
6147 /*
6148 If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
6149 value for byteRangeHi which will clamp it appropriately.
6150
6151 Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
6152 have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
6153 */
6154 if (iClosestSeekpoint < pFlac->seekpointCount-1) {
6155 drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
6156
6157 /* Basic validation on the seekpoints to ensure they're usable. */
6158 if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
6159 return DRFLAC_FALSE; /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
6160 }
6161
6162 if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
6163 byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
6164 }
6165 }
6166
6167 if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6168 if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6169 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
6170
6171 if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
6172 return DRFLAC_TRUE;
6173 }
6174 }
6175 }
6176 }
6177#endif /* !DR_FLAC_NO_CRC */
6178
6179 /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
6180
6181 /*
6182 If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
6183 from the seekpoint's first sample.
6184 */
6185 if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
6186 /* Optimized case. Just seek forward from where we are. */
6187 runningPCMFrameCount = pFlac->currentPCMFrame;
6188
6189 /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
6190 if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
6191 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6192 return DRFLAC_FALSE;
6193 }
6194 } else {
6195 isMidFrame = DRFLAC_TRUE;
6196 }
6197 } else {
6198 /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
6199 runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
6200
6201 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
6202 return DRFLAC_FALSE;
6203 }
6204
6205 /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
6206 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6207 return DRFLAC_FALSE;
6208 }
6209 }
6210
6211 for (;;) {
6212 drflac_uint64 pcmFrameCountInThisFLACFrame;
6213 drflac_uint64 firstPCMFrameInFLACFrame = 0;
6214 drflac_uint64 lastPCMFrameInFLACFrame = 0;
6215
6216 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
6217
6218 pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
6219 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
6220 /*
6221 The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
6222 it never existed and keep iterating.
6223 */
6224 drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
6225
6226 if (!isMidFrame) {
6227 drflac_result result = drflac__decode_flac_frame(pFlac);
6228 if (result == DRFLAC_SUCCESS) {
6229 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
6230 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
6231 } else {
6232 if (result == DRFLAC_CRC_MISMATCH) {
6233 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6234 } else {
6235 return DRFLAC_FALSE;
6236 }
6237 }
6238 } else {
6239 /* We started seeking mid-frame which means we need to skip the frame decoding part. */
6240 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
6241 }
6242 } else {
6243 /*
6244 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
6245 frame never existed and leave the running sample count untouched.
6246 */
6247 if (!isMidFrame) {
6248 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
6249 if (result == DRFLAC_SUCCESS) {
6250 runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
6251 } else {
6252 if (result == DRFLAC_CRC_MISMATCH) {
6253 goto next_iteration; /* CRC mismatch. Pretend this frame never existed. */
6254 } else {
6255 return DRFLAC_FALSE;
6256 }
6257 }
6258 } else {
6259 /*
6260 We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
6261 drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
6262 */
6263 runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
6264 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
6265 isMidFrame = DRFLAC_FALSE;
6266 }
6267
6268 /* If we are seeking to the end of the file and we've just hit it, we're done. */
6269 if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
6270 return DRFLAC_TRUE;
6271 }
6272 }
6273
6274 next_iteration:
6275 /* Grab the next frame in preparation for the next iteration. */
6276 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
6277 return DRFLAC_FALSE;
6278 }
6279 }
6280}
6281
6282
6283#ifndef DR_FLAC_NO_OGG
6284typedef struct
6285{
6286 drflac_uint8 capturePattern[4]; /* Should be "OggS" */
6287 drflac_uint8 structureVersion; /* Always 0. */
6288 drflac_uint8 headerType;
6289 drflac_uint64 granulePosition;
6290 drflac_uint32 serialNumber;
6291 drflac_uint32 sequenceNumber;
6292 drflac_uint32 checksum;
6293 drflac_uint8 segmentCount;
6294 drflac_uint8 segmentTable[255];
6295} drflac_ogg_page_header;
6296#endif
6297
6298typedef struct
6299{
6300 drflac_read_proc onRead;
6301 drflac_seek_proc onSeek;
6302 drflac_meta_proc onMeta;
6303 drflac_container container;
6304 void* pUserData;
6305 void* pUserDataMD;
6306 drflac_uint32 sampleRate;
6307 drflac_uint8 channels;
6308 drflac_uint8 bitsPerSample;
6309 drflac_uint64 totalPCMFrameCount;
6310 drflac_uint16 maxBlockSizeInPCMFrames;
6311 drflac_uint64 runningFilePos;
6312 drflac_bool32 hasStreamInfoBlock;
6313 drflac_bool32 hasMetadataBlocks;
6314 drflac_bs bs; /* <-- A bit streamer is required for loading data during initialization. */
6315 drflac_frame_header firstFrameHeader; /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
6316
6317#ifndef DR_FLAC_NO_OGG
6318 drflac_uint32 oggSerial;
6319 drflac_uint64 oggFirstBytePos;
6320 drflac_ogg_page_header oggBosHeader;
6321#endif
6322} drflac_init_info;
6323
6324static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6325{
6326 blockHeader = drflac__be2host_32(blockHeader);
6327 *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
6328 *blockType = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
6329 *blockSize = (blockHeader & 0x00FFFFFFUL);
6330}
6331
6332static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
6333{
6334 drflac_uint32 blockHeader;
6335
6336 *blockSize = 0;
6337 if (onRead(pUserData, &blockHeader, 4) != 4) {
6338 return DRFLAC_FALSE;
6339 }
6340
6341 drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
6342 return DRFLAC_TRUE;
6343}
6344
6345static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
6346{
6347 drflac_uint32 blockSizes;
6348 drflac_uint64 frameSizes = 0;
6349 drflac_uint64 importantProps;
6350 drflac_uint8 md5[16];
6351
6352 /* min/max block size. */
6353 if (onRead(pUserData, &blockSizes, 4) != 4) {
6354 return DRFLAC_FALSE;
6355 }
6356
6357 /* min/max frame size. */
6358 if (onRead(pUserData, &frameSizes, 6) != 6) {
6359 return DRFLAC_FALSE;
6360 }
6361
6362 /* Sample rate, channels, bits per sample and total sample count. */
6363 if (onRead(pUserData, &importantProps, 8) != 8) {
6364 return DRFLAC_FALSE;
6365 }
6366
6367 /* MD5 */
6368 if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
6369 return DRFLAC_FALSE;
6370 }
6371
6372 blockSizes = drflac__be2host_32(blockSizes);
6373 frameSizes = drflac__be2host_64(frameSizes);
6374 importantProps = drflac__be2host_64(importantProps);
6375
6376 pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
6377 pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
6378 pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
6379 pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes & (((drflac_uint64)0x00FFFFFF << 16) << 0)) >> 16);
6380 pStreamInfo->sampleRate = (drflac_uint32)((importantProps & (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
6381 pStreamInfo->channels = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
6382 pStreamInfo->bitsPerSample = (drflac_uint8 )((importantProps & (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
6383 pStreamInfo->totalPCMFrameCount = ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
6384 DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
6385
6386 return DRFLAC_TRUE;
6387}
6388
6389
6390static void* drflac__malloc_default(size_t sz, void* pUserData)
6391{
6392 (void)pUserData;
6393 return DRFLAC_MALLOC(sz);
6394}
6395
6396static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
6397{
6398 (void)pUserData;
6399 return DRFLAC_REALLOC(p, sz);
6400}
6401
6402static void drflac__free_default(void* p, void* pUserData)
6403{
6404 (void)pUserData;
6405 DRFLAC_FREE(p);
6406}
6407
6408
6409static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
6410{
6411 if (pAllocationCallbacks == NULL) {
6412 return NULL;
6413 }
6414
6415 if (pAllocationCallbacks->onMalloc != NULL) {
6416 return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
6417 }
6418
6419 /* Try using realloc(). */
6420 if (pAllocationCallbacks->onRealloc != NULL) {
6421 return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
6422 }
6423
6424 return NULL;
6425}
6426
6427static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
6428{
6429 if (pAllocationCallbacks == NULL) {
6430 return NULL;
6431 }
6432
6433 if (pAllocationCallbacks->onRealloc != NULL) {
6434 return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
6435 }
6436
6437 /* Try emulating realloc() in terms of malloc()/free(). */
6438 if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
6439 void* p2;
6440
6441 p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
6442 if (p2 == NULL) {
6443 return NULL;
6444 }
6445
6446 if (p != NULL) {
6447 DRFLAC_COPY_MEMORY(p2, p, szOld);
6448 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6449 }
6450
6451 return p2;
6452 }
6453
6454 return NULL;
6455}
6456
6457static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
6458{
6459 if (p == NULL || pAllocationCallbacks == NULL) {
6460 return;
6461 }
6462
6463 if (pAllocationCallbacks->onFree != NULL) {
6464 pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
6465 }
6466}
6467
6468
9e052883 6469static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks)
2ff0b512 6470{
6471 /*
6472 We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
6473 we'll be sitting on byte 42.
6474 */
6475 drflac_uint64 runningFilePos = 42;
6476 drflac_uint64 seektablePos = 0;
6477 drflac_uint32 seektableSize = 0;
6478
6479 for (;;) {
6480 drflac_metadata metadata;
6481 drflac_uint8 isLastBlock = 0;
6482 drflac_uint8 blockType;
6483 drflac_uint32 blockSize;
6484 if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
6485 return DRFLAC_FALSE;
6486 }
6487 runningFilePos += 4;
6488
6489 metadata.type = blockType;
6490 metadata.pRawData = NULL;
6491 metadata.rawDataSize = 0;
6492
6493 switch (blockType)
6494 {
6495 case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
6496 {
6497 if (blockSize < 4) {
6498 return DRFLAC_FALSE;
6499 }
6500
6501 if (onMeta) {
6502 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6503 if (pRawData == NULL) {
6504 return DRFLAC_FALSE;
6505 }
6506
6507 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6508 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6509 return DRFLAC_FALSE;
6510 }
6511
6512 metadata.pRawData = pRawData;
6513 metadata.rawDataSize = blockSize;
6514 metadata.data.application.id = drflac__be2host_32(*(drflac_uint32*)pRawData);
6515 metadata.data.application.pData = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
6516 metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
6517 onMeta(pUserDataMD, &metadata);
6518
6519 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6520 }
6521 } break;
6522
6523 case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
6524 {
6525 seektablePos = runningFilePos;
6526 seektableSize = blockSize;
6527
6528 if (onMeta) {
9e052883 6529 drflac_uint32 seekpointCount;
2ff0b512 6530 drflac_uint32 iSeekpoint;
6531 void* pRawData;
6532
9e052883 6533 seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6534
6535 pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks);
2ff0b512 6536 if (pRawData == NULL) {
6537 return DRFLAC_FALSE;
6538 }
6539
9e052883 6540 /* We need to read seekpoint by seekpoint and do some processing. */
6541 for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) {
6542 drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
2ff0b512 6543
9e052883 6544 if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
6545 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6546 return DRFLAC_FALSE;
6547 }
2ff0b512 6548
9e052883 6549 /* Endian swap. */
2ff0b512 6550 pSeekpoint->firstPCMFrame = drflac__be2host_64(pSeekpoint->firstPCMFrame);
6551 pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
6552 pSeekpoint->pcmFrameCount = drflac__be2host_16(pSeekpoint->pcmFrameCount);
6553 }
6554
9e052883 6555 metadata.pRawData = pRawData;
6556 metadata.rawDataSize = blockSize;
6557 metadata.data.seektable.seekpointCount = seekpointCount;
6558 metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
6559
2ff0b512 6560 onMeta(pUserDataMD, &metadata);
6561
6562 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6563 }
6564 } break;
6565
6566 case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
6567 {
6568 if (blockSize < 8) {
6569 return DRFLAC_FALSE;
6570 }
6571
6572 if (onMeta) {
6573 void* pRawData;
6574 const char* pRunningData;
6575 const char* pRunningDataEnd;
6576 drflac_uint32 i;
6577
6578 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6579 if (pRawData == NULL) {
6580 return DRFLAC_FALSE;
6581 }
6582
6583 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6584 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6585 return DRFLAC_FALSE;
6586 }
6587
6588 metadata.pRawData = pRawData;
6589 metadata.rawDataSize = blockSize;
6590
6591 pRunningData = (const char*)pRawData;
6592 pRunningDataEnd = (const char*)pRawData + blockSize;
6593
9e052883 6594 metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6595
6596 /* Need space for the rest of the block */
6597 if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6598 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6599 return DRFLAC_FALSE;
6600 }
6601 metadata.data.vorbis_comment.vendor = pRunningData; pRunningData += metadata.data.vorbis_comment.vendorLength;
9e052883 6602 metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6603
6604 /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
6605 if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
6606 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6607 return DRFLAC_FALSE;
6608 }
6609 metadata.data.vorbis_comment.pComments = pRunningData;
6610
6611 /* Check that the comments section is valid before passing it to the callback */
6612 for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
6613 drflac_uint32 commentLength;
6614
6615 if (pRunningDataEnd - pRunningData < 4) {
6616 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6617 return DRFLAC_FALSE;
6618 }
6619
9e052883 6620 commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6621 if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6622 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6623 return DRFLAC_FALSE;
6624 }
6625 pRunningData += commentLength;
6626 }
6627
6628 onMeta(pUserDataMD, &metadata);
6629
6630 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6631 }
6632 } break;
6633
6634 case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
6635 {
6636 if (blockSize < 396) {
6637 return DRFLAC_FALSE;
6638 }
6639
6640 if (onMeta) {
6641 void* pRawData;
6642 const char* pRunningData;
6643 const char* pRunningDataEnd;
9e052883 6644 size_t bufferSize;
2ff0b512 6645 drflac_uint8 iTrack;
6646 drflac_uint8 iIndex;
9e052883 6647 void* pTrackData;
2ff0b512 6648
9e052883 6649 /*
6650 This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation
6651 we need for storing the necessary data. The second pass will fill that buffer with usable data.
6652 */
2ff0b512 6653 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6654 if (pRawData == NULL) {
6655 return DRFLAC_FALSE;
6656 }
6657
6658 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6659 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6660 return DRFLAC_FALSE;
6661 }
6662
6663 metadata.pRawData = pRawData;
6664 metadata.rawDataSize = blockSize;
6665
6666 pRunningData = (const char*)pRawData;
6667 pRunningDataEnd = (const char*)pRawData + blockSize;
6668
6669 DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128); pRunningData += 128;
6670 metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
6671 metadata.data.cuesheet.isCD = (pRunningData[0] & 0x80) != 0; pRunningData += 259;
6672 metadata.data.cuesheet.trackCount = pRunningData[0]; pRunningData += 1;
9e052883 6673 metadata.data.cuesheet.pTrackData = NULL; /* Will be filled later. */
6674
6675 /* Pass 1: Calculate the size of the buffer for the track data. */
6676 {
6677 const char* pRunningDataSaved = pRunningData; /* Will be restored at the end in preparation for the second pass. */
2ff0b512 6678
9e052883 6679 bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES;
2ff0b512 6680
9e052883 6681 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6682 drflac_uint8 indexCount;
6683 drflac_uint32 indexPointSize;
6684
6685 if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) {
6686 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6687 return DRFLAC_FALSE;
6688 }
6689
6690 /* Skip to the index point count */
6691 pRunningData += 35;
6692
6693 indexCount = pRunningData[0];
6694 pRunningData += 1;
6695
6696 bufferSize += indexCount * sizeof(drflac_cuesheet_track_index);
6697
6698 /* Quick validation check. */
6699 indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6700 if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
6701 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6702 return DRFLAC_FALSE;
6703 }
6704
6705 pRunningData += indexPointSize;
2ff0b512 6706 }
6707
9e052883 6708 pRunningData = pRunningDataSaved;
6709 }
6710
6711 /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */
6712 {
6713 char* pRunningTrackData;
6714
6715 pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks);
6716 if (pTrackData == NULL) {
2ff0b512 6717 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6718 return DRFLAC_FALSE;
6719 }
6720
9e052883 6721 pRunningTrackData = (char*)pTrackData;
6722
6723 for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
6724 drflac_uint8 indexCount;
6725
6726 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES);
6727 pRunningData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */
6728 pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1;
6729
6730 /* Grab the index count for the next part. */
6731 indexCount = pRunningData[0];
6732 pRunningData += 1;
6733 pRunningTrackData += 1;
6734
6735 /* Extract each track index. */
6736 for (iIndex = 0; iIndex < indexCount; ++iIndex) {
6737 drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData;
6738
6739 DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES);
6740 pRunningData += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
6741 pRunningTrackData += sizeof(drflac_cuesheet_track_index);
6742
6743 pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset);
6744 }
2ff0b512 6745 }
9e052883 6746
6747 metadata.data.cuesheet.pTrackData = pTrackData;
2ff0b512 6748 }
6749
9e052883 6750 /* The original data is no longer needed. */
6751 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6752 pRawData = NULL;
6753
2ff0b512 6754 onMeta(pUserDataMD, &metadata);
6755
9e052883 6756 drflac__free_from_callbacks(pTrackData, pAllocationCallbacks);
6757 pTrackData = NULL;
2ff0b512 6758 }
6759 } break;
6760
6761 case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
6762 {
6763 if (blockSize < 32) {
6764 return DRFLAC_FALSE;
6765 }
6766
6767 if (onMeta) {
6768 void* pRawData;
6769 const char* pRunningData;
6770 const char* pRunningDataEnd;
6771
6772 pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6773 if (pRawData == NULL) {
6774 return DRFLAC_FALSE;
6775 }
6776
6777 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6778 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6779 return DRFLAC_FALSE;
6780 }
6781
6782 metadata.pRawData = pRawData;
6783 metadata.rawDataSize = blockSize;
6784
6785 pRunningData = (const char*)pRawData;
6786 pRunningDataEnd = (const char*)pRawData + blockSize;
6787
9e052883 6788 metadata.data.picture.type = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6789 metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6790
6791 /* Need space for the rest of the block */
6792 if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6793 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6794 return DRFLAC_FALSE;
6795 }
9e052883 6796 metadata.data.picture.mime = pRunningData; pRunningData += metadata.data.picture.mimeLength;
6797 metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6798
6799 /* Need space for the rest of the block */
6800 if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
6801 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6802 return DRFLAC_FALSE;
6803 }
9e052883 6804 metadata.data.picture.description = pRunningData; pRunningData += metadata.data.picture.descriptionLength;
6805 metadata.data.picture.width = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6806 metadata.data.picture.height = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6807 metadata.data.picture.colorDepth = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6808 metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
6809 metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
2ff0b512 6810 metadata.data.picture.pPictureData = (const drflac_uint8*)pRunningData;
6811
6812 /* Need space for the picture after the block */
6813 if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
6814 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6815 return DRFLAC_FALSE;
6816 }
6817
6818 onMeta(pUserDataMD, &metadata);
6819
6820 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6821 }
6822 } break;
6823
6824 case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
6825 {
6826 if (onMeta) {
6827 metadata.data.padding.unused = 0;
6828
6829 /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
6830 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6831 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6832 } else {
6833 onMeta(pUserDataMD, &metadata);
6834 }
6835 }
6836 } break;
6837
6838 case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
6839 {
6840 /* Invalid chunk. Just skip over this one. */
6841 if (onMeta) {
6842 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6843 isLastBlock = DRFLAC_TRUE; /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
6844 }
6845 }
6846 } break;
6847
6848 default:
6849 {
6850 /*
6851 It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
6852 can at the very least report the chunk to the application and let it look at the raw data.
6853 */
6854 if (onMeta) {
6855 void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
6856 if (pRawData == NULL) {
6857 return DRFLAC_FALSE;
6858 }
6859
6860 if (onRead(pUserData, pRawData, blockSize) != blockSize) {
6861 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6862 return DRFLAC_FALSE;
6863 }
6864
6865 metadata.pRawData = pRawData;
6866 metadata.rawDataSize = blockSize;
6867 onMeta(pUserDataMD, &metadata);
6868
6869 drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
6870 }
6871 } break;
6872 }
6873
6874 /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
6875 if (onMeta == NULL && blockSize > 0) {
6876 if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
6877 isLastBlock = DRFLAC_TRUE;
6878 }
6879 }
6880
6881 runningFilePos += blockSize;
6882 if (isLastBlock) {
6883 break;
6884 }
6885 }
6886
9e052883 6887 *pSeektablePos = seektablePos;
6888 *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
6889 *pFirstFramePos = runningFilePos;
2ff0b512 6890
6891 return DRFLAC_TRUE;
6892}
6893
6894static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
6895{
6896 /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
6897
6898 drflac_uint8 isLastBlock;
6899 drflac_uint8 blockType;
6900 drflac_uint32 blockSize;
6901
6902 (void)onSeek;
6903
6904 pInit->container = drflac_container_native;
6905
6906 /* The first metadata block should be the STREAMINFO block. */
6907 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
6908 return DRFLAC_FALSE;
6909 }
6910
6911 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
6912 if (!relaxed) {
6913 /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
6914 return DRFLAC_FALSE;
6915 } else {
6916 /*
6917 Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
6918 for that frame.
6919 */
6920 pInit->hasStreamInfoBlock = DRFLAC_FALSE;
6921 pInit->hasMetadataBlocks = DRFLAC_FALSE;
6922
6923 if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
6924 return DRFLAC_FALSE; /* Couldn't find a frame. */
6925 }
6926
6927 if (pInit->firstFrameHeader.bitsPerSample == 0) {
6928 return DRFLAC_FALSE; /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
6929 }
6930
6931 pInit->sampleRate = pInit->firstFrameHeader.sampleRate;
6932 pInit->channels = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
6933 pInit->bitsPerSample = pInit->firstFrameHeader.bitsPerSample;
6934 pInit->maxBlockSizeInPCMFrames = 65535; /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
6935 return DRFLAC_TRUE;
6936 }
6937 } else {
6938 drflac_streaminfo streaminfo;
6939 if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
6940 return DRFLAC_FALSE;
6941 }
6942
6943 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
6944 pInit->sampleRate = streaminfo.sampleRate;
6945 pInit->channels = streaminfo.channels;
6946 pInit->bitsPerSample = streaminfo.bitsPerSample;
6947 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
6948 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames; /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
6949 pInit->hasMetadataBlocks = !isLastBlock;
6950
6951 if (onMeta) {
6952 drflac_metadata metadata;
6953 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
6954 metadata.pRawData = NULL;
6955 metadata.rawDataSize = 0;
6956 metadata.data.streaminfo = streaminfo;
6957 onMeta(pUserDataMD, &metadata);
6958 }
6959
6960 return DRFLAC_TRUE;
6961 }
6962}
6963
6964#ifndef DR_FLAC_NO_OGG
6965#define DRFLAC_OGG_MAX_PAGE_SIZE 65307
6966#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32 1605413199 /* CRC-32 of "OggS". */
6967
6968typedef enum
6969{
6970 drflac_ogg_recover_on_crc_mismatch,
6971 drflac_ogg_fail_on_crc_mismatch
6972} drflac_ogg_crc_mismatch_recovery;
6973
6974#ifndef DR_FLAC_NO_CRC
6975static drflac_uint32 drflac__crc32_table[] = {
6976 0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
6977 0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
6978 0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
6979 0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
6980 0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
6981 0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
6982 0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
6983 0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
6984 0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
6985 0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
6986 0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
6987 0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
6988 0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
6989 0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
6990 0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
6991 0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
6992 0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
6993 0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
6994 0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
6995 0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
6996 0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
6997 0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
6998 0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
6999 0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
7000 0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
7001 0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
7002 0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
7003 0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
7004 0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
7005 0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
7006 0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
7007 0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
7008 0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
7009 0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
7010 0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
7011 0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
7012 0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
7013 0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
7014 0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
7015 0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
7016 0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
7017 0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
7018 0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
7019 0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
7020 0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
7021 0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
7022 0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
7023 0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
7024 0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
7025 0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
7026 0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
7027 0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
7028 0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
7029 0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
7030 0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
7031 0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
7032 0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
7033 0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
7034 0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
7035 0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
7036 0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
7037 0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
7038 0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
7039 0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
7040};
7041#endif
7042
7043static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
7044{
7045#ifndef DR_FLAC_NO_CRC
7046 return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
7047#else
7048 (void)data;
7049 return crc32;
7050#endif
7051}
7052
9e052883 7053#if 0
7054static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
7055{
7056 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
7057 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
7058 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 8) & 0xFF));
7059 crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 0) & 0xFF));
7060 return crc32;
7061}
7062
7063static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
7064{
7065 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
7066 crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 0) & 0xFFFFFFFF));
7067 return crc32;
7068}
7069#endif
7070
2ff0b512 7071static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
7072{
7073 /* This can be optimized. */
7074 drflac_uint32 i;
7075 for (i = 0; i < dataSize; ++i) {
7076 crc32 = drflac_crc32_byte(crc32, pData[i]);
7077 }
7078 return crc32;
7079}
7080
7081
7082static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
7083{
7084 return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
7085}
7086
7087static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
7088{
7089 return 27 + pHeader->segmentCount;
7090}
7091
7092static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
7093{
7094 drflac_uint32 pageBodySize = 0;
7095 int i;
7096
7097 for (i = 0; i < pHeader->segmentCount; ++i) {
7098 pageBodySize += pHeader->segmentTable[i];
7099 }
7100
7101 return pageBodySize;
7102}
7103
7104static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7105{
7106 drflac_uint8 data[23];
7107 drflac_uint32 i;
7108
7109 DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
7110
7111 if (onRead(pUserData, data, 23) != 23) {
7112 return DRFLAC_AT_END;
7113 }
7114 *pBytesRead += 23;
7115
7116 /*
7117 It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
7118 us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
7119 like to have it map to the structure of the underlying data.
7120 */
7121 pHeader->capturePattern[0] = 'O';
7122 pHeader->capturePattern[1] = 'g';
7123 pHeader->capturePattern[2] = 'g';
7124 pHeader->capturePattern[3] = 'S';
7125
7126 pHeader->structureVersion = data[0];
7127 pHeader->headerType = data[1];
7128 DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
7129 DRFLAC_COPY_MEMORY(&pHeader->serialNumber, &data[10], 4);
7130 DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber, &data[14], 4);
7131 DRFLAC_COPY_MEMORY(&pHeader->checksum, &data[18], 4);
7132 pHeader->segmentCount = data[22];
7133
7134 /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
7135 data[18] = 0;
7136 data[19] = 0;
7137 data[20] = 0;
7138 data[21] = 0;
7139
7140 for (i = 0; i < 23; ++i) {
7141 *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
7142 }
7143
7144
7145 if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
7146 return DRFLAC_AT_END;
7147 }
7148 *pBytesRead += pHeader->segmentCount;
7149
7150 for (i = 0; i < pHeader->segmentCount; ++i) {
7151 *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
7152 }
7153
7154 return DRFLAC_SUCCESS;
7155}
7156
7157static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
7158{
7159 drflac_uint8 id[4];
7160
7161 *pBytesRead = 0;
7162
7163 if (onRead(pUserData, id, 4) != 4) {
7164 return DRFLAC_AT_END;
7165 }
7166 *pBytesRead += 4;
7167
7168 /* We need to read byte-by-byte until we find the OggS capture pattern. */
7169 for (;;) {
7170 if (drflac_ogg__is_capture_pattern(id)) {
7171 drflac_result result;
7172
7173 *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7174
7175 result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
7176 if (result == DRFLAC_SUCCESS) {
7177 return DRFLAC_SUCCESS;
7178 } else {
7179 if (result == DRFLAC_CRC_MISMATCH) {
7180 continue;
7181 } else {
7182 return result;
7183 }
7184 }
7185 } else {
7186 /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
7187 id[0] = id[1];
7188 id[1] = id[2];
7189 id[2] = id[3];
7190 if (onRead(pUserData, &id[3], 1) != 1) {
7191 return DRFLAC_AT_END;
7192 }
7193 *pBytesRead += 1;
7194 }
7195 }
7196}
7197
7198
7199/*
7200The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
7201in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
7202in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
7203dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
7204the physical Ogg bitstream are converted and delivered in native FLAC format.
7205*/
7206typedef struct
7207{
7208 drflac_read_proc onRead; /* The original onRead callback from drflac_open() and family. */
7209 drflac_seek_proc onSeek; /* The original onSeek callback from drflac_open() and family. */
7210 void* pUserData; /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
7211 drflac_uint64 currentBytePos; /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
7212 drflac_uint64 firstBytePos; /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
7213 drflac_uint32 serialNumber; /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
7214 drflac_ogg_page_header bosPageHeader; /* Used for seeking. */
7215 drflac_ogg_page_header currentPageHeader;
7216 drflac_uint32 bytesRemainingInPage;
7217 drflac_uint32 pageDataSize;
7218 drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
7219} drflac_oggbs; /* oggbs = Ogg Bitstream */
7220
7221static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
7222{
7223 size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
7224 oggbs->currentBytePos += bytesActuallyRead;
7225
7226 return bytesActuallyRead;
7227}
7228
7229static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
7230{
7231 if (origin == drflac_seek_origin_start) {
7232 if (offset <= 0x7FFFFFFF) {
7233 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
7234 return DRFLAC_FALSE;
7235 }
7236 oggbs->currentBytePos = offset;
7237
7238 return DRFLAC_TRUE;
7239 } else {
7240 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
7241 return DRFLAC_FALSE;
7242 }
7243 oggbs->currentBytePos = offset;
7244
7245 return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
7246 }
7247 } else {
7248 while (offset > 0x7FFFFFFF) {
7249 if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
7250 return DRFLAC_FALSE;
7251 }
7252 oggbs->currentBytePos += 0x7FFFFFFF;
7253 offset -= 0x7FFFFFFF;
7254 }
7255
7256 if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) { /* <-- Safe cast thanks to the loop above. */
7257 return DRFLAC_FALSE;
7258 }
7259 oggbs->currentBytePos += offset;
7260
7261 return DRFLAC_TRUE;
7262 }
7263}
7264
7265static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
7266{
7267 drflac_ogg_page_header header;
7268 for (;;) {
7269 drflac_uint32 crc32 = 0;
7270 drflac_uint32 bytesRead;
7271 drflac_uint32 pageBodySize;
7272#ifndef DR_FLAC_NO_CRC
7273 drflac_uint32 actualCRC32;
7274#endif
7275
7276 if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7277 return DRFLAC_FALSE;
7278 }
7279 oggbs->currentBytePos += bytesRead;
7280
7281 pageBodySize = drflac_ogg__get_page_body_size(&header);
7282 if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
7283 continue; /* Invalid page size. Assume it's corrupted and just move to the next page. */
7284 }
7285
7286 if (header.serialNumber != oggbs->serialNumber) {
7287 /* It's not a FLAC page. Skip it. */
7288 if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
7289 return DRFLAC_FALSE;
7290 }
7291 continue;
7292 }
7293
7294
7295 /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
7296 if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
7297 return DRFLAC_FALSE;
7298 }
7299 oggbs->pageDataSize = pageBodySize;
7300
7301#ifndef DR_FLAC_NO_CRC
7302 actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
7303 if (actualCRC32 != header.checksum) {
7304 if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
7305 continue; /* CRC mismatch. Skip this page. */
7306 } else {
7307 /*
7308 Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
7309 go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
7310 seek did not fully complete.
7311 */
7312 drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
7313 return DRFLAC_FALSE;
7314 }
7315 }
7316#else
7317 (void)recoveryMethod; /* <-- Silence a warning. */
7318#endif
7319
7320 oggbs->currentPageHeader = header;
7321 oggbs->bytesRemainingInPage = pageBodySize;
7322 return DRFLAC_TRUE;
7323 }
7324}
7325
9e052883 7326/* Function below is unused at the moment, but I might be re-adding it later. */
7327#if 0
7328static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
7329{
7330 drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
7331 drflac_uint8 iSeg = 0;
7332 drflac_uint32 iByte = 0;
7333 while (iByte < bytesConsumedInPage) {
7334 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7335 if (iByte + segmentSize > bytesConsumedInPage) {
7336 break;
7337 } else {
7338 iSeg += 1;
7339 iByte += segmentSize;
7340 }
7341 }
7342
7343 *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
7344 return iSeg;
7345}
7346
7347static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
7348{
7349 /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
7350 for (;;) {
7351 drflac_bool32 atEndOfPage = DRFLAC_FALSE;
7352
7353 drflac_uint8 bytesRemainingInSeg;
7354 drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
7355
7356 drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
7357 for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
7358 drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
7359 if (segmentSize < 255) {
7360 if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
7361 atEndOfPage = DRFLAC_TRUE;
7362 }
7363
7364 break;
7365 }
7366
7367 bytesToEndOfPacketOrPage += segmentSize;
7368 }
7369
7370 /*
7371 At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
7372 want to load the next page and keep searching for the end of the packet.
7373 */
7374 drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
7375 oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
7376
7377 if (atEndOfPage) {
7378 /*
7379 We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
7380 straddle pages.
7381 */
7382 if (!drflac_oggbs__goto_next_page(oggbs)) {
7383 return DRFLAC_FALSE;
7384 }
7385
7386 /* If it's a fresh packet it most likely means we're at the next packet. */
7387 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
7388 return DRFLAC_TRUE;
7389 }
7390 } else {
7391 /* We're at the next packet. */
7392 return DRFLAC_TRUE;
7393 }
7394 }
7395}
7396
7397static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
7398{
7399 /* The bitstream should be sitting on the first byte just after the header of the frame. */
7400
7401 /* What we're actually doing here is seeking to the start of the next packet. */
7402 return drflac_oggbs__seek_to_next_packet(oggbs);
7403}
7404#endif
7405
2ff0b512 7406static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
7407{
7408 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7409 drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
7410 size_t bytesRead = 0;
7411
7412 DRFLAC_ASSERT(oggbs != NULL);
7413 DRFLAC_ASSERT(pRunningBufferOut != NULL);
7414
7415 /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
7416 while (bytesRead < bytesToRead) {
7417 size_t bytesRemainingToRead = bytesToRead - bytesRead;
7418
7419 if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
7420 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
7421 bytesRead += bytesRemainingToRead;
7422 oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
7423 break;
7424 }
7425
7426 /* If we get here it means some of the requested data is contained in the next pages. */
7427 if (oggbs->bytesRemainingInPage > 0) {
7428 DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
7429 bytesRead += oggbs->bytesRemainingInPage;
7430 pRunningBufferOut += oggbs->bytesRemainingInPage;
7431 oggbs->bytesRemainingInPage = 0;
7432 }
7433
7434 DRFLAC_ASSERT(bytesRemainingToRead > 0);
7435 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7436 break; /* Failed to go to the next page. Might have simply hit the end of the stream. */
7437 }
7438 }
7439
7440 return bytesRead;
7441}
7442
7443static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
7444{
7445 drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
7446 int bytesSeeked = 0;
7447
7448 DRFLAC_ASSERT(oggbs != NULL);
7449 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
7450
7451 /* Seeking is always forward which makes things a lot simpler. */
7452 if (origin == drflac_seek_origin_start) {
7453 if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
7454 return DRFLAC_FALSE;
7455 }
7456
7457 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7458 return DRFLAC_FALSE;
7459 }
7460
7461 return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
7462 }
7463
7464 DRFLAC_ASSERT(origin == drflac_seek_origin_current);
7465
7466 while (bytesSeeked < offset) {
7467 int bytesRemainingToSeek = offset - bytesSeeked;
7468 DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
7469
7470 if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
7471 bytesSeeked += bytesRemainingToSeek;
7472 (void)bytesSeeked; /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
7473 oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
7474 break;
7475 }
7476
7477 /* If we get here it means some of the requested data is contained in the next pages. */
7478 if (oggbs->bytesRemainingInPage > 0) {
7479 bytesSeeked += (int)oggbs->bytesRemainingInPage;
7480 oggbs->bytesRemainingInPage = 0;
7481 }
7482
7483 DRFLAC_ASSERT(bytesRemainingToSeek > 0);
7484 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
7485 /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
7486 return DRFLAC_FALSE;
7487 }
7488 }
7489
7490 return DRFLAC_TRUE;
7491}
7492
7493
7494static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
7495{
7496 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
7497 drflac_uint64 originalBytePos;
7498 drflac_uint64 runningGranulePosition;
7499 drflac_uint64 runningFrameBytePos;
7500 drflac_uint64 runningPCMFrameCount;
7501
7502 DRFLAC_ASSERT(oggbs != NULL);
7503
7504 originalBytePos = oggbs->currentBytePos; /* For recovery. Points to the OggS identifier. */
7505
7506 /* First seek to the first frame. */
7507 if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
7508 return DRFLAC_FALSE;
7509 }
7510 oggbs->bytesRemainingInPage = 0;
7511
7512 runningGranulePosition = 0;
7513 for (;;) {
7514 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7515 drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
7516 return DRFLAC_FALSE; /* Never did find that sample... */
7517 }
7518
7519 runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
7520 if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
7521 break; /* The sample is somewhere in the previous page. */
7522 }
7523
7524 /*
7525 At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
7526 disregard any pages that do not begin a fresh packet.
7527 */
7528 if ((oggbs->currentPageHeader.headerType & 0x01) == 0) { /* <-- Is it a fresh page? */
7529 if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
7530 drflac_uint8 firstBytesInPage[2];
7531 firstBytesInPage[0] = oggbs->pageData[0];
7532 firstBytesInPage[1] = oggbs->pageData[1];
7533
7534 if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) { /* <-- Does the page begin with a frame's sync code? */
7535 runningGranulePosition = oggbs->currentPageHeader.granulePosition;
7536 }
7537
7538 continue;
7539 }
7540 }
7541 }
7542
7543 /*
7544 We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
7545 start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
7546 a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
7547 we find the one containing the target sample.
7548 */
7549 if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
7550 return DRFLAC_FALSE;
7551 }
7552 if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
7553 return DRFLAC_FALSE;
7554 }
7555
7556 /*
7557 At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
7558 looping over these frames until we find the one containing the sample we're after.
7559 */
7560 runningPCMFrameCount = runningGranulePosition;
7561 for (;;) {
7562 /*
7563 There are two ways to find the sample and seek past irrelevant frames:
7564 1) Use the native FLAC decoder.
7565 2) Use Ogg's framing system.
7566
7567 Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
7568 do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
7569 duplication for the decoding of frame headers.
7570
7571 Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
7572 bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
7573 standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
7574 the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
7575 using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
7576 avoid the use of the drflac_bs object.
7577
7578 Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
7579 1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
7580 2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
7581 3) Simplicity.
7582 */
7583 drflac_uint64 firstPCMFrameInFLACFrame = 0;
7584 drflac_uint64 lastPCMFrameInFLACFrame = 0;
7585 drflac_uint64 pcmFrameCountInThisFrame;
7586
7587 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
7588 return DRFLAC_FALSE;
7589 }
7590
7591 drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
7592
7593 pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
7594
7595 /* If we are seeking to the end of the file and we've just hit it, we're done. */
7596 if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
7597 drflac_result result = drflac__decode_flac_frame(pFlac);
7598 if (result == DRFLAC_SUCCESS) {
7599 pFlac->currentPCMFrame = pcmFrameIndex;
7600 pFlac->currentFLACFrame.pcmFramesRemaining = 0;
7601 return DRFLAC_TRUE;
7602 } else {
7603 return DRFLAC_FALSE;
7604 }
7605 }
7606
7607 if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
7608 /*
7609 The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
7610 it never existed and keep iterating.
7611 */
7612 drflac_result result = drflac__decode_flac_frame(pFlac);
7613 if (result == DRFLAC_SUCCESS) {
7614 /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
7615 drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount); /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
7616 if (pcmFramesToDecode == 0) {
7617 return DRFLAC_TRUE;
7618 }
7619
7620 pFlac->currentPCMFrame = runningPCMFrameCount;
7621
7622 return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode; /* <-- If this fails, something bad has happened (it should never fail). */
7623 } else {
7624 if (result == DRFLAC_CRC_MISMATCH) {
7625 continue; /* CRC mismatch. Pretend this frame never existed. */
7626 } else {
7627 return DRFLAC_FALSE;
7628 }
7629 }
7630 } else {
7631 /*
7632 It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
7633 frame never existed and leave the running sample count untouched.
7634 */
7635 drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
7636 if (result == DRFLAC_SUCCESS) {
7637 runningPCMFrameCount += pcmFrameCountInThisFrame;
7638 } else {
7639 if (result == DRFLAC_CRC_MISMATCH) {
7640 continue; /* CRC mismatch. Pretend this frame never existed. */
7641 } else {
7642 return DRFLAC_FALSE;
7643 }
7644 }
7645 }
7646 }
7647}
7648
7649
7650
7651static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
7652{
7653 drflac_ogg_page_header header;
7654 drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
7655 drflac_uint32 bytesRead = 0;
7656
7657 /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
7658 (void)relaxed;
7659
7660 pInit->container = drflac_container_ogg;
7661 pInit->oggFirstBytePos = 0;
7662
7663 /*
7664 We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
7665 stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
7666 any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
7667 */
7668 if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7669 return DRFLAC_FALSE;
7670 }
7671 pInit->runningFilePos += bytesRead;
7672
7673 for (;;) {
7674 int pageBodySize;
7675
7676 /* Break if we're past the beginning of stream page. */
7677 if ((header.headerType & 0x02) == 0) {
7678 return DRFLAC_FALSE;
7679 }
7680
7681 /* Check if it's a FLAC header. */
7682 pageBodySize = drflac_ogg__get_page_body_size(&header);
7683 if (pageBodySize == 51) { /* 51 = the lacing value of the FLAC header packet. */
7684 /* It could be a FLAC page... */
7685 drflac_uint32 bytesRemainingInPage = pageBodySize;
7686 drflac_uint8 packetType;
7687
7688 if (onRead(pUserData, &packetType, 1) != 1) {
7689 return DRFLAC_FALSE;
7690 }
7691
7692 bytesRemainingInPage -= 1;
7693 if (packetType == 0x7F) {
7694 /* Increasingly more likely to be a FLAC page... */
7695 drflac_uint8 sig[4];
7696 if (onRead(pUserData, sig, 4) != 4) {
7697 return DRFLAC_FALSE;
7698 }
7699
7700 bytesRemainingInPage -= 4;
7701 if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
7702 /* Almost certainly a FLAC page... */
7703 drflac_uint8 mappingVersion[2];
7704 if (onRead(pUserData, mappingVersion, 2) != 2) {
7705 return DRFLAC_FALSE;
7706 }
7707
7708 if (mappingVersion[0] != 1) {
7709 return DRFLAC_FALSE; /* Only supporting version 1.x of the Ogg mapping. */
7710 }
7711
7712 /*
7713 The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
7714 be handling it in a generic way based on the serial number and packet types.
7715 */
7716 if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
7717 return DRFLAC_FALSE;
7718 }
7719
7720 /* Expecting the native FLAC signature "fLaC". */
7721 if (onRead(pUserData, sig, 4) != 4) {
7722 return DRFLAC_FALSE;
7723 }
7724
7725 if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
7726 /* The remaining data in the page should be the STREAMINFO block. */
7727 drflac_streaminfo streaminfo;
7728 drflac_uint8 isLastBlock;
7729 drflac_uint8 blockType;
7730 drflac_uint32 blockSize;
7731 if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
7732 return DRFLAC_FALSE;
7733 }
7734
7735 if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
7736 return DRFLAC_FALSE; /* Invalid block type. First block must be the STREAMINFO block. */
7737 }
7738
7739 if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
7740 /* Success! */
7741 pInit->hasStreamInfoBlock = DRFLAC_TRUE;
7742 pInit->sampleRate = streaminfo.sampleRate;
7743 pInit->channels = streaminfo.channels;
7744 pInit->bitsPerSample = streaminfo.bitsPerSample;
7745 pInit->totalPCMFrameCount = streaminfo.totalPCMFrameCount;
7746 pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
7747 pInit->hasMetadataBlocks = !isLastBlock;
7748
7749 if (onMeta) {
7750 drflac_metadata metadata;
7751 metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
7752 metadata.pRawData = NULL;
7753 metadata.rawDataSize = 0;
7754 metadata.data.streaminfo = streaminfo;
7755 onMeta(pUserDataMD, &metadata);
7756 }
7757
7758 pInit->runningFilePos += pageBodySize;
7759 pInit->oggFirstBytePos = pInit->runningFilePos - 79; /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
7760 pInit->oggSerial = header.serialNumber;
7761 pInit->oggBosHeader = header;
7762 break;
7763 } else {
7764 /* Failed to read STREAMINFO block. Aww, so close... */
7765 return DRFLAC_FALSE;
7766 }
7767 } else {
7768 /* Invalid file. */
7769 return DRFLAC_FALSE;
7770 }
7771 } else {
7772 /* Not a FLAC header. Skip it. */
7773 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7774 return DRFLAC_FALSE;
7775 }
7776 }
7777 } else {
7778 /* Not a FLAC header. Seek past the entire page and move on to the next. */
7779 if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
7780 return DRFLAC_FALSE;
7781 }
7782 }
7783 } else {
7784 if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
7785 return DRFLAC_FALSE;
7786 }
7787 }
7788
7789 pInit->runningFilePos += pageBodySize;
7790
7791
7792 /* Read the header of the next page. */
7793 if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
7794 return DRFLAC_FALSE;
7795 }
7796 pInit->runningFilePos += bytesRead;
7797 }
7798
7799 /*
7800 If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
7801 packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
7802 Ogg bistream object.
7803 */
7804 pInit->hasMetadataBlocks = DRFLAC_TRUE; /* <-- Always have at least VORBIS_COMMENT metadata block. */
7805 return DRFLAC_TRUE;
7806}
7807#endif
7808
7809static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
7810{
7811 drflac_bool32 relaxed;
7812 drflac_uint8 id[4];
7813
7814 if (pInit == NULL || onRead == NULL || onSeek == NULL) {
7815 return DRFLAC_FALSE;
7816 }
7817
7818 DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
7819 pInit->onRead = onRead;
7820 pInit->onSeek = onSeek;
7821 pInit->onMeta = onMeta;
7822 pInit->container = container;
7823 pInit->pUserData = pUserData;
7824 pInit->pUserDataMD = pUserDataMD;
7825
7826 pInit->bs.onRead = onRead;
7827 pInit->bs.onSeek = onSeek;
7828 pInit->bs.pUserData = pUserData;
7829 drflac__reset_cache(&pInit->bs);
7830
7831
7832 /* If the container is explicitly defined then we can try opening in relaxed mode. */
7833 relaxed = container != drflac_container_unknown;
7834
7835 /* Skip over any ID3 tags. */
7836 for (;;) {
7837 if (onRead(pUserData, id, 4) != 4) {
7838 return DRFLAC_FALSE; /* Ran out of data. */
7839 }
7840 pInit->runningFilePos += 4;
7841
7842 if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
7843 drflac_uint8 header[6];
7844 drflac_uint8 flags;
7845 drflac_uint32 headerSize;
7846
7847 if (onRead(pUserData, header, 6) != 6) {
7848 return DRFLAC_FALSE; /* Ran out of data. */
7849 }
7850 pInit->runningFilePos += 6;
7851
7852 flags = header[1];
7853
7854 DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
7855 headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
7856 if (flags & 0x10) {
7857 headerSize += 10;
7858 }
7859
7860 if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
7861 return DRFLAC_FALSE; /* Failed to seek past the tag. */
7862 }
7863 pInit->runningFilePos += headerSize;
7864 } else {
7865 break;
7866 }
7867 }
7868
7869 if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
7870 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7871 }
7872#ifndef DR_FLAC_NO_OGG
7873 if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
7874 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7875 }
7876#endif
7877
7878 /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
7879 if (relaxed) {
7880 if (container == drflac_container_native) {
7881 return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7882 }
7883#ifndef DR_FLAC_NO_OGG
7884 if (container == drflac_container_ogg) {
7885 return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
7886 }
7887#endif
7888 }
7889
7890 /* Unsupported container. */
7891 return DRFLAC_FALSE;
7892}
7893
7894static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
7895{
7896 DRFLAC_ASSERT(pFlac != NULL);
7897 DRFLAC_ASSERT(pInit != NULL);
7898
7899 DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
7900 pFlac->bs = pInit->bs;
7901 pFlac->onMeta = pInit->onMeta;
7902 pFlac->pUserDataMD = pInit->pUserDataMD;
7903 pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
7904 pFlac->sampleRate = pInit->sampleRate;
7905 pFlac->channels = (drflac_uint8)pInit->channels;
7906 pFlac->bitsPerSample = (drflac_uint8)pInit->bitsPerSample;
7907 pFlac->totalPCMFrameCount = pInit->totalPCMFrameCount;
7908 pFlac->container = pInit->container;
7909}
7910
7911
7912static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
7913{
7914 drflac_init_info init;
7915 drflac_uint32 allocationSize;
7916 drflac_uint32 wholeSIMDVectorCountPerChannel;
7917 drflac_uint32 decodedSamplesAllocationSize;
7918#ifndef DR_FLAC_NO_OGG
9e052883 7919 drflac_oggbs* pOggbs = NULL;
2ff0b512 7920#endif
7921 drflac_uint64 firstFramePos;
7922 drflac_uint64 seektablePos;
9e052883 7923 drflac_uint32 seekpointCount;
2ff0b512 7924 drflac_allocation_callbacks allocationCallbacks;
7925 drflac* pFlac;
7926
7927 /* CPU support first. */
7928 drflac__init_cpu_caps();
7929
7930 if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
7931 return NULL;
7932 }
7933
7934 if (pAllocationCallbacks != NULL) {
7935 allocationCallbacks = *pAllocationCallbacks;
7936 if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
7937 return NULL; /* Invalid allocation callbacks. */
7938 }
7939 } else {
7940 allocationCallbacks.pUserData = NULL;
7941 allocationCallbacks.onMalloc = drflac__malloc_default;
7942 allocationCallbacks.onRealloc = drflac__realloc_default;
7943 allocationCallbacks.onFree = drflac__free_default;
7944 }
7945
7946
7947 /*
7948 The size of the allocation for the drflac object needs to be large enough to fit the following:
7949 1) The main members of the drflac structure
7950 2) A block of memory large enough to store the decoded samples of the largest frame in the stream
7951 3) If the container is Ogg, a drflac_oggbs object
7952
7953 The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
7954 the different SIMD instruction sets.
7955 */
7956 allocationSize = sizeof(drflac);
7957
7958 /*
7959 The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
7960 we are supporting.
7961 */
7962 if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
7963 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
7964 } else {
7965 wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
7966 }
7967
7968 decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
7969
7970 allocationSize += decodedSamplesAllocationSize;
7971 allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE; /* Allocate extra bytes to ensure we have enough for alignment. */
7972
7973#ifndef DR_FLAC_NO_OGG
7974 /* There's additional data required for Ogg streams. */
7975 if (init.container == drflac_container_ogg) {
7976 allocationSize += sizeof(drflac_oggbs);
2ff0b512 7977
9e052883 7978 pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks);
7979 if (pOggbs == NULL) {
7980 return NULL; /*DRFLAC_OUT_OF_MEMORY;*/
7981 }
7982
7983 DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs));
7984 pOggbs->onRead = onRead;
7985 pOggbs->onSeek = onSeek;
7986 pOggbs->pUserData = pUserData;
7987 pOggbs->currentBytePos = init.oggFirstBytePos;
7988 pOggbs->firstBytePos = init.oggFirstBytePos;
7989 pOggbs->serialNumber = init.oggSerial;
7990 pOggbs->bosPageHeader = init.oggBosHeader;
7991 pOggbs->bytesRemainingInPage = 0;
2ff0b512 7992 }
7993#endif
7994
7995 /*
7996 This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
7997 consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
7998 and decoding the metadata.
7999 */
9e052883 8000 firstFramePos = 42; /* <-- We know we are at byte 42 at this point. */
8001 seektablePos = 0;
8002 seekpointCount = 0;
2ff0b512 8003 if (init.hasMetadataBlocks) {
8004 drflac_read_proc onReadOverride = onRead;
8005 drflac_seek_proc onSeekOverride = onSeek;
8006 void* pUserDataOverride = pUserData;
8007
8008#ifndef DR_FLAC_NO_OGG
8009 if (init.container == drflac_container_ogg) {
8010 onReadOverride = drflac__on_read_ogg;
8011 onSeekOverride = drflac__on_seek_ogg;
9e052883 8012 pUserDataOverride = (void*)pOggbs;
2ff0b512 8013 }
8014#endif
8015
9e052883 8016 if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) {
8017 #ifndef DR_FLAC_NO_OGG
8018 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8019 #endif
2ff0b512 8020 return NULL;
8021 }
8022
9e052883 8023 allocationSize += seekpointCount * sizeof(drflac_seekpoint);
2ff0b512 8024 }
8025
8026
8027 pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
8028 if (pFlac == NULL) {
9e052883 8029 #ifndef DR_FLAC_NO_OGG
8030 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8031 #endif
2ff0b512 8032 return NULL;
8033 }
8034
8035 drflac__init_from_info(pFlac, &init);
8036 pFlac->allocationCallbacks = allocationCallbacks;
8037 pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
8038
8039#ifndef DR_FLAC_NO_OGG
8040 if (init.container == drflac_container_ogg) {
9e052883 8041 drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint)));
8042 DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs));
8043
8044 /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */
8045 drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
8046 pOggbs = NULL;
2ff0b512 8047
8048 /* The Ogg bistream needs to be layered on top of the original bitstream. */
8049 pFlac->bs.onRead = drflac__on_read_ogg;
8050 pFlac->bs.onSeek = drflac__on_seek_ogg;
8051 pFlac->bs.pUserData = (void*)pInternalOggbs;
8052 pFlac->_oggbs = (void*)pInternalOggbs;
8053 }
8054#endif
8055
8056 pFlac->firstFLACFramePosInBytes = firstFramePos;
8057
8058 /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
8059#ifndef DR_FLAC_NO_OGG
8060 if (init.container == drflac_container_ogg)
8061 {
8062 pFlac->pSeekpoints = NULL;
8063 pFlac->seekpointCount = 0;
8064 }
8065 else
8066#endif
8067 {
8068 /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
8069 if (seektablePos != 0) {
9e052883 8070 pFlac->seekpointCount = seekpointCount;
2ff0b512 8071 pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
8072
8073 DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
8074 DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
8075
8076 /* Seek to the seektable, then just read directly into our seektable buffer. */
8077 if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
9e052883 8078 drflac_uint32 iSeekpoint;
8079
8080 for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) {
8081 if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
8082 /* Endian swap. */
2ff0b512 8083 pFlac->pSeekpoints[iSeekpoint].firstPCMFrame = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
8084 pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
8085 pFlac->pSeekpoints[iSeekpoint].pcmFrameCount = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
9e052883 8086 } else {
8087 /* Failed to read the seektable. Pretend we don't have one. */
8088 pFlac->pSeekpoints = NULL;
8089 pFlac->seekpointCount = 0;
8090 break;
f5b7bb83 8091 }
f5b7bb83 8092 }
2ff0b512 8093
f5b7bb83 8094 /* We need to seek back to where we were. If this fails it's a critical error. */
8095 if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
8096 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8097 return NULL;
8098 }
8099 } else {
8100 /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
8101 pFlac->pSeekpoints = NULL;
8102 pFlac->seekpointCount = 0;
8103 }
8104 }
2ff0b512 8105 }
8106
2ff0b512 8107
9e052883 8108 /*
8109 If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
8110 the first frame.
8111 */
8112 if (!init.hasStreamInfoBlock) {
8113 pFlac->currentFLACFrame.header = init.firstFrameHeader;
8114 for (;;) {
8115 drflac_result result = drflac__decode_flac_frame(pFlac);
8116 if (result == DRFLAC_SUCCESS) {
8117 break;
8118 } else {
8119 if (result == DRFLAC_CRC_MISMATCH) {
8120 if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
8121 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8122 return NULL;
8123 }
8124 continue;
8125 } else {
8126 drflac__free_from_callbacks(pFlac, &allocationCallbacks);
8127 return NULL;
8128 }
8129 }
8130 }
8131 }
8132
8133 return pFlac;
8134}
8135
8136
8137
8138#ifndef DR_FLAC_NO_STDIO
8139#include <stdio.h>
8140#ifndef DR_FLAC_NO_WCHAR
8141#include <wchar.h> /* For wcslen(), wcsrtombs() */
8142#endif
8143
8144/* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
8145#include <errno.h>
8146static drflac_result drflac_result_from_errno(int e)
8147{
8148 switch (e)
8149 {
8150 case 0: return DRFLAC_SUCCESS;
8151 #ifdef EPERM
8152 case EPERM: return DRFLAC_INVALID_OPERATION;
8153 #endif
8154 #ifdef ENOENT
8155 case ENOENT: return DRFLAC_DOES_NOT_EXIST;
8156 #endif
8157 #ifdef ESRCH
8158 case ESRCH: return DRFLAC_DOES_NOT_EXIST;
8159 #endif
8160 #ifdef EINTR
8161 case EINTR: return DRFLAC_INTERRUPT;
8162 #endif
8163 #ifdef EIO
8164 case EIO: return DRFLAC_IO_ERROR;
8165 #endif
8166 #ifdef ENXIO
8167 case ENXIO: return DRFLAC_DOES_NOT_EXIST;
8168 #endif
8169 #ifdef E2BIG
8170 case E2BIG: return DRFLAC_INVALID_ARGS;
8171 #endif
8172 #ifdef ENOEXEC
8173 case ENOEXEC: return DRFLAC_INVALID_FILE;
8174 #endif
8175 #ifdef EBADF
8176 case EBADF: return DRFLAC_INVALID_FILE;
8177 #endif
8178 #ifdef ECHILD
8179 case ECHILD: return DRFLAC_ERROR;
8180 #endif
8181 #ifdef EAGAIN
8182 case EAGAIN: return DRFLAC_UNAVAILABLE;
8183 #endif
8184 #ifdef ENOMEM
8185 case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
8186 #endif
8187 #ifdef EACCES
8188 case EACCES: return DRFLAC_ACCESS_DENIED;
8189 #endif
8190 #ifdef EFAULT
8191 case EFAULT: return DRFLAC_BAD_ADDRESS;
8192 #endif
8193 #ifdef ENOTBLK
8194 case ENOTBLK: return DRFLAC_ERROR;
8195 #endif
8196 #ifdef EBUSY
8197 case EBUSY: return DRFLAC_BUSY;
8198 #endif
8199 #ifdef EEXIST
8200 case EEXIST: return DRFLAC_ALREADY_EXISTS;
8201 #endif
8202 #ifdef EXDEV
8203 case EXDEV: return DRFLAC_ERROR;
8204 #endif
8205 #ifdef ENODEV
8206 case ENODEV: return DRFLAC_DOES_NOT_EXIST;
8207 #endif
8208 #ifdef ENOTDIR
8209 case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
8210 #endif
8211 #ifdef EISDIR
8212 case EISDIR: return DRFLAC_IS_DIRECTORY;
8213 #endif
8214 #ifdef EINVAL
8215 case EINVAL: return DRFLAC_INVALID_ARGS;
8216 #endif
8217 #ifdef ENFILE
8218 case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8219 #endif
8220 #ifdef EMFILE
8221 case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
8222 #endif
8223 #ifdef ENOTTY
8224 case ENOTTY: return DRFLAC_INVALID_OPERATION;
8225 #endif
8226 #ifdef ETXTBSY
8227 case ETXTBSY: return DRFLAC_BUSY;
8228 #endif
8229 #ifdef EFBIG
8230 case EFBIG: return DRFLAC_TOO_BIG;
8231 #endif
8232 #ifdef ENOSPC
8233 case ENOSPC: return DRFLAC_NO_SPACE;
8234 #endif
8235 #ifdef ESPIPE
8236 case ESPIPE: return DRFLAC_BAD_SEEK;
8237 #endif
8238 #ifdef EROFS
8239 case EROFS: return DRFLAC_ACCESS_DENIED;
8240 #endif
8241 #ifdef EMLINK
8242 case EMLINK: return DRFLAC_TOO_MANY_LINKS;
8243 #endif
8244 #ifdef EPIPE
8245 case EPIPE: return DRFLAC_BAD_PIPE;
8246 #endif
8247 #ifdef EDOM
8248 case EDOM: return DRFLAC_OUT_OF_RANGE;
8249 #endif
8250 #ifdef ERANGE
8251 case ERANGE: return DRFLAC_OUT_OF_RANGE;
8252 #endif
8253 #ifdef EDEADLK
8254 case EDEADLK: return DRFLAC_DEADLOCK;
8255 #endif
8256 #ifdef ENAMETOOLONG
8257 case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
8258 #endif
8259 #ifdef ENOLCK
8260 case ENOLCK: return DRFLAC_ERROR;
8261 #endif
8262 #ifdef ENOSYS
8263 case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
8264 #endif
8265 #ifdef ENOTEMPTY
8266 case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
8267 #endif
8268 #ifdef ELOOP
8269 case ELOOP: return DRFLAC_TOO_MANY_LINKS;
8270 #endif
8271 #ifdef ENOMSG
8272 case ENOMSG: return DRFLAC_NO_MESSAGE;
8273 #endif
8274 #ifdef EIDRM
8275 case EIDRM: return DRFLAC_ERROR;
8276 #endif
8277 #ifdef ECHRNG
8278 case ECHRNG: return DRFLAC_ERROR;
8279 #endif
8280 #ifdef EL2NSYNC
8281 case EL2NSYNC: return DRFLAC_ERROR;
8282 #endif
8283 #ifdef EL3HLT
8284 case EL3HLT: return DRFLAC_ERROR;
8285 #endif
8286 #ifdef EL3RST
8287 case EL3RST: return DRFLAC_ERROR;
8288 #endif
8289 #ifdef ELNRNG
8290 case ELNRNG: return DRFLAC_OUT_OF_RANGE;
8291 #endif
8292 #ifdef EUNATCH
8293 case EUNATCH: return DRFLAC_ERROR;
8294 #endif
8295 #ifdef ENOCSI
8296 case ENOCSI: return DRFLAC_ERROR;
8297 #endif
8298 #ifdef EL2HLT
8299 case EL2HLT: return DRFLAC_ERROR;
8300 #endif
8301 #ifdef EBADE
8302 case EBADE: return DRFLAC_ERROR;
8303 #endif
8304 #ifdef EBADR
8305 case EBADR: return DRFLAC_ERROR;
8306 #endif
8307 #ifdef EXFULL
8308 case EXFULL: return DRFLAC_ERROR;
8309 #endif
8310 #ifdef ENOANO
8311 case ENOANO: return DRFLAC_ERROR;
8312 #endif
8313 #ifdef EBADRQC
8314 case EBADRQC: return DRFLAC_ERROR;
8315 #endif
8316 #ifdef EBADSLT
8317 case EBADSLT: return DRFLAC_ERROR;
8318 #endif
8319 #ifdef EBFONT
8320 case EBFONT: return DRFLAC_INVALID_FILE;
8321 #endif
8322 #ifdef ENOSTR
8323 case ENOSTR: return DRFLAC_ERROR;
8324 #endif
8325 #ifdef ENODATA
8326 case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
8327 #endif
8328 #ifdef ETIME
8329 case ETIME: return DRFLAC_TIMEOUT;
8330 #endif
8331 #ifdef ENOSR
8332 case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
8333 #endif
8334 #ifdef ENONET
8335 case ENONET: return DRFLAC_NO_NETWORK;
8336 #endif
8337 #ifdef ENOPKG
8338 case ENOPKG: return DRFLAC_ERROR;
8339 #endif
8340 #ifdef EREMOTE
8341 case EREMOTE: return DRFLAC_ERROR;
8342 #endif
8343 #ifdef ENOLINK
8344 case ENOLINK: return DRFLAC_ERROR;
8345 #endif
8346 #ifdef EADV
8347 case EADV: return DRFLAC_ERROR;
8348 #endif
8349 #ifdef ESRMNT
8350 case ESRMNT: return DRFLAC_ERROR;
8351 #endif
8352 #ifdef ECOMM
8353 case ECOMM: return DRFLAC_ERROR;
8354 #endif
8355 #ifdef EPROTO
8356 case EPROTO: return DRFLAC_ERROR;
8357 #endif
8358 #ifdef EMULTIHOP
8359 case EMULTIHOP: return DRFLAC_ERROR;
8360 #endif
8361 #ifdef EDOTDOT
8362 case EDOTDOT: return DRFLAC_ERROR;
8363 #endif
8364 #ifdef EBADMSG
8365 case EBADMSG: return DRFLAC_BAD_MESSAGE;
8366 #endif
8367 #ifdef EOVERFLOW
8368 case EOVERFLOW: return DRFLAC_TOO_BIG;
8369 #endif
8370 #ifdef ENOTUNIQ
8371 case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
8372 #endif
8373 #ifdef EBADFD
8374 case EBADFD: return DRFLAC_ERROR;
8375 #endif
8376 #ifdef EREMCHG
8377 case EREMCHG: return DRFLAC_ERROR;
8378 #endif
8379 #ifdef ELIBACC
8380 case ELIBACC: return DRFLAC_ACCESS_DENIED;
8381 #endif
8382 #ifdef ELIBBAD
8383 case ELIBBAD: return DRFLAC_INVALID_FILE;
8384 #endif
8385 #ifdef ELIBSCN
8386 case ELIBSCN: return DRFLAC_INVALID_FILE;
8387 #endif
8388 #ifdef ELIBMAX
8389 case ELIBMAX: return DRFLAC_ERROR;
8390 #endif
8391 #ifdef ELIBEXEC
8392 case ELIBEXEC: return DRFLAC_ERROR;
8393 #endif
8394 #ifdef EILSEQ
8395 case EILSEQ: return DRFLAC_INVALID_DATA;
8396 #endif
8397 #ifdef ERESTART
8398 case ERESTART: return DRFLAC_ERROR;
8399 #endif
8400 #ifdef ESTRPIPE
8401 case ESTRPIPE: return DRFLAC_ERROR;
8402 #endif
8403 #ifdef EUSERS
8404 case EUSERS: return DRFLAC_ERROR;
8405 #endif
8406 #ifdef ENOTSOCK
8407 case ENOTSOCK: return DRFLAC_NOT_SOCKET;
8408 #endif
8409 #ifdef EDESTADDRREQ
8410 case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
8411 #endif
8412 #ifdef EMSGSIZE
8413 case EMSGSIZE: return DRFLAC_TOO_BIG;
8414 #endif
8415 #ifdef EPROTOTYPE
8416 case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
8417 #endif
8418 #ifdef ENOPROTOOPT
8419 case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
8420 #endif
8421 #ifdef EPROTONOSUPPORT
8422 case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
8423 #endif
8424 #ifdef ESOCKTNOSUPPORT
8425 case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
8426 #endif
8427 #ifdef EOPNOTSUPP
8428 case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
8429 #endif
8430 #ifdef EPFNOSUPPORT
8431 case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
8432 #endif
8433 #ifdef EAFNOSUPPORT
8434 case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
8435 #endif
8436 #ifdef EADDRINUSE
8437 case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
8438 #endif
8439 #ifdef EADDRNOTAVAIL
8440 case EADDRNOTAVAIL: return DRFLAC_ERROR;
8441 #endif
8442 #ifdef ENETDOWN
8443 case ENETDOWN: return DRFLAC_NO_NETWORK;
8444 #endif
8445 #ifdef ENETUNREACH
8446 case ENETUNREACH: return DRFLAC_NO_NETWORK;
8447 #endif
8448 #ifdef ENETRESET
8449 case ENETRESET: return DRFLAC_NO_NETWORK;
8450 #endif
8451 #ifdef ECONNABORTED
8452 case ECONNABORTED: return DRFLAC_NO_NETWORK;
8453 #endif
8454 #ifdef ECONNRESET
8455 case ECONNRESET: return DRFLAC_CONNECTION_RESET;
8456 #endif
8457 #ifdef ENOBUFS
8458 case ENOBUFS: return DRFLAC_NO_SPACE;
8459 #endif
8460 #ifdef EISCONN
8461 case EISCONN: return DRFLAC_ALREADY_CONNECTED;
8462 #endif
8463 #ifdef ENOTCONN
8464 case ENOTCONN: return DRFLAC_NOT_CONNECTED;
8465 #endif
8466 #ifdef ESHUTDOWN
8467 case ESHUTDOWN: return DRFLAC_ERROR;
8468 #endif
8469 #ifdef ETOOMANYREFS
8470 case ETOOMANYREFS: return DRFLAC_ERROR;
8471 #endif
8472 #ifdef ETIMEDOUT
8473 case ETIMEDOUT: return DRFLAC_TIMEOUT;
8474 #endif
8475 #ifdef ECONNREFUSED
8476 case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
8477 #endif
8478 #ifdef EHOSTDOWN
8479 case EHOSTDOWN: return DRFLAC_NO_HOST;
8480 #endif
8481 #ifdef EHOSTUNREACH
8482 case EHOSTUNREACH: return DRFLAC_NO_HOST;
8483 #endif
8484 #ifdef EALREADY
8485 case EALREADY: return DRFLAC_IN_PROGRESS;
8486 #endif
8487 #ifdef EINPROGRESS
8488 case EINPROGRESS: return DRFLAC_IN_PROGRESS;
8489 #endif
8490 #ifdef ESTALE
8491 case ESTALE: return DRFLAC_INVALID_FILE;
8492 #endif
8493 #ifdef EUCLEAN
8494 case EUCLEAN: return DRFLAC_ERROR;
8495 #endif
8496 #ifdef ENOTNAM
8497 case ENOTNAM: return DRFLAC_ERROR;
8498 #endif
8499 #ifdef ENAVAIL
8500 case ENAVAIL: return DRFLAC_ERROR;
8501 #endif
8502 #ifdef EISNAM
8503 case EISNAM: return DRFLAC_ERROR;
8504 #endif
8505 #ifdef EREMOTEIO
8506 case EREMOTEIO: return DRFLAC_IO_ERROR;
8507 #endif
8508 #ifdef EDQUOT
8509 case EDQUOT: return DRFLAC_NO_SPACE;
8510 #endif
8511 #ifdef ENOMEDIUM
8512 case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
8513 #endif
8514 #ifdef EMEDIUMTYPE
8515 case EMEDIUMTYPE: return DRFLAC_ERROR;
8516 #endif
8517 #ifdef ECANCELED
8518 case ECANCELED: return DRFLAC_CANCELLED;
8519 #endif
8520 #ifdef ENOKEY
8521 case ENOKEY: return DRFLAC_ERROR;
8522 #endif
8523 #ifdef EKEYEXPIRED
8524 case EKEYEXPIRED: return DRFLAC_ERROR;
8525 #endif
8526 #ifdef EKEYREVOKED
8527 case EKEYREVOKED: return DRFLAC_ERROR;
8528 #endif
8529 #ifdef EKEYREJECTED
8530 case EKEYREJECTED: return DRFLAC_ERROR;
8531 #endif
8532 #ifdef EOWNERDEAD
8533 case EOWNERDEAD: return DRFLAC_ERROR;
8534 #endif
8535 #ifdef ENOTRECOVERABLE
8536 case ENOTRECOVERABLE: return DRFLAC_ERROR;
8537 #endif
8538 #ifdef ERFKILL
8539 case ERFKILL: return DRFLAC_ERROR;
8540 #endif
8541 #ifdef EHWPOISON
8542 case EHWPOISON: return DRFLAC_ERROR;
8543 #endif
8544 default: return DRFLAC_ERROR;
8545 }
8546}
8547
8548static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
8549{
8550#if defined(_MSC_VER) && _MSC_VER >= 1400
8551 errno_t err;
8552#endif
8553
8554 if (ppFile != NULL) {
8555 *ppFile = NULL; /* Safety. */
8556 }
8557
8558 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8559 return DRFLAC_INVALID_ARGS;
8560 }
8561
8562#if defined(_MSC_VER) && _MSC_VER >= 1400
8563 err = fopen_s(ppFile, pFilePath, pOpenMode);
8564 if (err != 0) {
8565 return drflac_result_from_errno(err);
8566 }
8567#else
8568#if defined(_WIN32) || defined(__APPLE__)
8569 *ppFile = fopen(pFilePath, pOpenMode);
8570#else
8571 #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
8572 *ppFile = fopen64(pFilePath, pOpenMode);
8573 #else
8574 *ppFile = fopen(pFilePath, pOpenMode);
8575 #endif
8576#endif
8577 if (*ppFile == NULL) {
8578 drflac_result result = drflac_result_from_errno(errno);
8579 if (result == DRFLAC_SUCCESS) {
8580 result = DRFLAC_ERROR; /* Just a safety check to make sure we never ever return success when pFile == NULL. */
8581 }
8582
8583 return result;
8584 }
8585#endif
8586
8587 return DRFLAC_SUCCESS;
8588}
8589
8590/*
8591_wfopen() isn't always available in all compilation environments.
8592
8593 * Windows only.
8594 * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
8595 * MinGW-64 (both 32- and 64-bit) seems to support it.
8596 * MinGW wraps it in !defined(__STRICT_ANSI__).
8597 * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
8598
8599This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
8600fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
8601*/
8602#if defined(_WIN32)
8603 #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
8604 #define DRFLAC_HAS_WFOPEN
8605 #endif
8606#endif
8607
8608#ifndef DR_FLAC_NO_WCHAR
8609static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
8610{
8611 if (ppFile != NULL) {
8612 *ppFile = NULL; /* Safety. */
8613 }
8614
8615 if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
8616 return DRFLAC_INVALID_ARGS;
8617 }
8618
8619#if defined(DRFLAC_HAS_WFOPEN)
8620 {
8621 /* Use _wfopen() on Windows. */
8622 #if defined(_MSC_VER) && _MSC_VER >= 1400
8623 errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
8624 if (err != 0) {
8625 return drflac_result_from_errno(err);
8626 }
8627 #else
8628 *ppFile = _wfopen(pFilePath, pOpenMode);
8629 if (*ppFile == NULL) {
8630 return drflac_result_from_errno(errno);
8631 }
8632 #endif
8633 (void)pAllocationCallbacks;
8634 }
8635#else
8636 /*
8637 Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
8638 fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
8639 that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
8640 maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
8641 error I'll look into improving compatibility.
8642 */
8643
8644 /*
8645 Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
8646 need to abort with an error. If you encounter a compiler lacking such support, add it to this list
8647 and submit a bug report and it'll be added to the library upstream.
8648 */
8649 #if defined(__DJGPP__)
8650 {
8651 /* Nothing to do here. This will fall through to the error check below. */
8652 }
8653 #else
8654 {
8655 mbstate_t mbs;
8656 size_t lenMB;
8657 const wchar_t* pFilePathTemp = pFilePath;
8658 char* pFilePathMB = NULL;
8659 char pOpenModeMB[32] = {0};
8660
8661 /* Get the length first. */
8662 DRFLAC_ZERO_OBJECT(&mbs);
8663 lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
8664 if (lenMB == (size_t)-1) {
8665 return drflac_result_from_errno(errno);
8666 }
8667
8668 pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
8669 if (pFilePathMB == NULL) {
8670 return DRFLAC_OUT_OF_MEMORY;
8671 }
8672
8673 pFilePathTemp = pFilePath;
8674 DRFLAC_ZERO_OBJECT(&mbs);
8675 wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
8676
8677 /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
8678 {
8679 size_t i = 0;
8680 for (;;) {
8681 if (pOpenMode[i] == 0) {
8682 pOpenModeMB[i] = '\0';
8683 break;
8684 }
8685
8686 pOpenModeMB[i] = (char)pOpenMode[i];
8687 i += 1;
8688 }
8689 }
8690
8691 *ppFile = fopen(pFilePathMB, pOpenModeMB);
8692
8693 drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
8694 }
8695 #endif
8696
8697 if (*ppFile == NULL) {
8698 return DRFLAC_ERROR;
8699 }
8700#endif
8701
8702 return DRFLAC_SUCCESS;
8703}
8704#endif
8705
8706static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
8707{
8708 return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
8709}
8710
8711static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
8712{
8713 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8714
8715 return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
8716}
8717
8718
8719DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8720{
8721 drflac* pFlac;
8722 FILE* pFile;
8723
8724 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8725 return NULL;
8726 }
8727
8728 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8729 if (pFlac == NULL) {
8730 fclose(pFile);
8731 return NULL;
8732 }
8733
8734 return pFlac;
8735}
8736
8737#ifndef DR_FLAC_NO_WCHAR
8738DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
8739{
8740 drflac* pFlac;
8741 FILE* pFile;
8742
8743 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8744 return NULL;
8745 }
8746
8747 pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
8748 if (pFlac == NULL) {
8749 fclose(pFile);
8750 return NULL;
8751 }
8752
8753 return pFlac;
8754}
8755#endif
8756
8757DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8758{
8759 drflac* pFlac;
8760 FILE* pFile;
8761
8762 if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
8763 return NULL;
8764 }
8765
8766 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8767 if (pFlac == NULL) {
8768 fclose(pFile);
8769 return pFlac;
8770 }
8771
8772 return pFlac;
8773}
8774
8775#ifndef DR_FLAC_NO_WCHAR
8776DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8777{
8778 drflac* pFlac;
8779 FILE* pFile;
8780
8781 if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
8782 return NULL;
8783 }
8784
8785 pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
8786 if (pFlac == NULL) {
8787 fclose(pFile);
8788 return pFlac;
2ff0b512 8789 }
8790
8791 return pFlac;
8792}
9e052883 8793#endif
8794#endif /* DR_FLAC_NO_STDIO */
2ff0b512 8795
8796static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
8797{
8798 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8799 size_t bytesRemaining;
8800
8801 DRFLAC_ASSERT(memoryStream != NULL);
8802 DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
8803
8804 bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
8805 if (bytesToRead > bytesRemaining) {
8806 bytesToRead = bytesRemaining;
8807 }
8808
8809 if (bytesToRead > 0) {
8810 DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
8811 memoryStream->currentReadPos += bytesToRead;
8812 }
8813
8814 return bytesToRead;
8815}
8816
8817static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
8818{
8819 drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
8820
8821 DRFLAC_ASSERT(memoryStream != NULL);
8822 DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
8823
8824 if (offset > (drflac_int64)memoryStream->dataSize) {
8825 return DRFLAC_FALSE;
8826 }
8827
8828 if (origin == drflac_seek_origin_current) {
8829 if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
8830 memoryStream->currentReadPos += offset;
8831 } else {
8832 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8833 }
8834 } else {
8835 if ((drflac_uint32)offset <= memoryStream->dataSize) {
8836 memoryStream->currentReadPos = offset;
8837 } else {
8838 return DRFLAC_FALSE; /* Trying to seek too far forward. */
8839 }
8840 }
8841
8842 return DRFLAC_TRUE;
8843}
8844
8845DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
8846{
8847 drflac__memory_stream memoryStream;
8848 drflac* pFlac;
8849
8850 memoryStream.data = (const drflac_uint8*)pData;
8851 memoryStream.dataSize = dataSize;
8852 memoryStream.currentReadPos = 0;
8853 pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
8854 if (pFlac == NULL) {
8855 return NULL;
8856 }
8857
8858 pFlac->memoryStream = memoryStream;
8859
8860 /* This is an awful hack... */
8861#ifndef DR_FLAC_NO_OGG
8862 if (pFlac->container == drflac_container_ogg)
8863 {
8864 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8865 oggbs->pUserData = &pFlac->memoryStream;
8866 }
8867 else
8868#endif
8869 {
8870 pFlac->bs.pUserData = &pFlac->memoryStream;
8871 }
8872
8873 return pFlac;
8874}
8875
8876DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8877{
8878 drflac__memory_stream memoryStream;
8879 drflac* pFlac;
8880
8881 memoryStream.data = (const drflac_uint8*)pData;
8882 memoryStream.dataSize = dataSize;
8883 memoryStream.currentReadPos = 0;
8884 pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
8885 if (pFlac == NULL) {
8886 return NULL;
8887 }
8888
8889 pFlac->memoryStream = memoryStream;
8890
8891 /* This is an awful hack... */
8892#ifndef DR_FLAC_NO_OGG
8893 if (pFlac->container == drflac_container_ogg)
8894 {
8895 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8896 oggbs->pUserData = &pFlac->memoryStream;
8897 }
8898 else
8899#endif
8900 {
8901 pFlac->bs.pUserData = &pFlac->memoryStream;
8902 }
8903
8904 return pFlac;
8905}
8906
8907
8908
8909DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8910{
8911 return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8912}
8913DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8914{
8915 return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
8916}
8917
8918DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8919{
8920 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
8921}
8922DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
8923{
8924 return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
8925}
8926
8927DRFLAC_API void drflac_close(drflac* pFlac)
8928{
8929 if (pFlac == NULL) {
8930 return;
8931 }
8932
9e052883 8933#ifndef DR_FLAC_NO_STDIO
8934 /*
8935 If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
8936 was used by looking at the callbacks.
8937 */
8938 if (pFlac->bs.onRead == drflac__on_read_stdio) {
8939 fclose((FILE*)pFlac->bs.pUserData);
8940 }
8941
8942#ifndef DR_FLAC_NO_OGG
8943 /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
8944 if (pFlac->container == drflac_container_ogg) {
8945 drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
8946 DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
8947
8948 if (oggbs->onRead == drflac__on_read_stdio) {
8949 fclose((FILE*)oggbs->pUserData);
8950 }
8951 }
8952#endif
8953#endif
8954
2ff0b512 8955 drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
8956}
8957
9e052883 8958
8959#if 0
8960static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8961{
8962 drflac_uint64 i;
8963 for (i = 0; i < frameCount; ++i) {
8964 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
8965 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
8966 drflac_uint32 right = left - side;
8967
8968 pOutputSamples[i*2+0] = (drflac_int32)left;
8969 pOutputSamples[i*2+1] = (drflac_int32)right;
8970 }
8971}
8972#endif
8973
2ff0b512 8974static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
8975{
8976 drflac_uint64 i;
8977 drflac_uint64 frameCount4 = frameCount >> 2;
8978 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
8979 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
8980 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
8981 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
8982
8983 for (i = 0; i < frameCount4; ++i) {
8984 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
8985 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
8986 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
8987 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
8988
8989 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
8990 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
8991 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
8992 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
8993
8994 drflac_uint32 right0 = left0 - side0;
8995 drflac_uint32 right1 = left1 - side1;
8996 drflac_uint32 right2 = left2 - side2;
8997 drflac_uint32 right3 = left3 - side3;
8998
8999 pOutputSamples[i*8+0] = (drflac_int32)left0;
9000 pOutputSamples[i*8+1] = (drflac_int32)right0;
9001 pOutputSamples[i*8+2] = (drflac_int32)left1;
9002 pOutputSamples[i*8+3] = (drflac_int32)right1;
9003 pOutputSamples[i*8+4] = (drflac_int32)left2;
9004 pOutputSamples[i*8+5] = (drflac_int32)right2;
9005 pOutputSamples[i*8+6] = (drflac_int32)left3;
9006 pOutputSamples[i*8+7] = (drflac_int32)right3;
9007 }
9008
9009 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9010 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9011 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9012 drflac_uint32 right = left - side;
9013
9014 pOutputSamples[i*2+0] = (drflac_int32)left;
9015 pOutputSamples[i*2+1] = (drflac_int32)right;
9016 }
9017}
9018
9019#if defined(DRFLAC_SUPPORT_SSE2)
9020static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9021{
9022 drflac_uint64 i;
9023 drflac_uint64 frameCount4 = frameCount >> 2;
9024 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9025 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9026 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9027 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9028
9029 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9030
9031 for (i = 0; i < frameCount4; ++i) {
9032 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9033 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9034 __m128i right = _mm_sub_epi32(left, side);
9035
9036 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9037 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9038 }
9039
9040 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9041 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9042 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9043 drflac_uint32 right = left - side;
9044
9045 pOutputSamples[i*2+0] = (drflac_int32)left;
9046 pOutputSamples[i*2+1] = (drflac_int32)right;
9047 }
9048}
9049#endif
9050
9051#if defined(DRFLAC_SUPPORT_NEON)
9052static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9053{
9054 drflac_uint64 i;
9055 drflac_uint64 frameCount4 = frameCount >> 2;
9056 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9057 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9058 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9059 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9060 int32x4_t shift0_4;
9061 int32x4_t shift1_4;
9062
9063 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9064
9065 shift0_4 = vdupq_n_s32(shift0);
9066 shift1_4 = vdupq_n_s32(shift1);
9067
9068 for (i = 0; i < frameCount4; ++i) {
9069 uint32x4_t left;
9070 uint32x4_t side;
9071 uint32x4_t right;
9072
9073 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9074 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9075 right = vsubq_u32(left, side);
9076
9077 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9078 }
9079
9080 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9081 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9082 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9083 drflac_uint32 right = left - side;
9084
9085 pOutputSamples[i*2+0] = (drflac_int32)left;
9086 pOutputSamples[i*2+1] = (drflac_int32)right;
9087 }
9088}
9089#endif
9090
9091static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9092{
9093#if defined(DRFLAC_SUPPORT_SSE2)
9094 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9095 drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9096 } else
9097#elif defined(DRFLAC_SUPPORT_NEON)
9098 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9099 drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9100 } else
9101#endif
9102 {
9103 /* Scalar fallback. */
9e052883 9104#if 0
9105 drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9106#else
2ff0b512 9107 drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9108#endif
2ff0b512 9109 }
9110}
9111
9112
9e052883 9113#if 0
9114static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9115{
9116 drflac_uint64 i;
9117 for (i = 0; i < frameCount; ++i) {
9118 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9119 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9120 drflac_uint32 left = right + side;
9121
9122 pOutputSamples[i*2+0] = (drflac_int32)left;
9123 pOutputSamples[i*2+1] = (drflac_int32)right;
9124 }
9125}
9126#endif
9127
2ff0b512 9128static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9129{
9130 drflac_uint64 i;
9131 drflac_uint64 frameCount4 = frameCount >> 2;
9132 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9133 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9134 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9135 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9136
9137 for (i = 0; i < frameCount4; ++i) {
9138 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
9139 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
9140 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
9141 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
9142
9143 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9144 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9145 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9146 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9147
9148 drflac_uint32 left0 = right0 + side0;
9149 drflac_uint32 left1 = right1 + side1;
9150 drflac_uint32 left2 = right2 + side2;
9151 drflac_uint32 left3 = right3 + side3;
9152
9153 pOutputSamples[i*8+0] = (drflac_int32)left0;
9154 pOutputSamples[i*8+1] = (drflac_int32)right0;
9155 pOutputSamples[i*8+2] = (drflac_int32)left1;
9156 pOutputSamples[i*8+3] = (drflac_int32)right1;
9157 pOutputSamples[i*8+4] = (drflac_int32)left2;
9158 pOutputSamples[i*8+5] = (drflac_int32)right2;
9159 pOutputSamples[i*8+6] = (drflac_int32)left3;
9160 pOutputSamples[i*8+7] = (drflac_int32)right3;
9161 }
9162
9163 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9164 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9165 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9166 drflac_uint32 left = right + side;
9167
9168 pOutputSamples[i*2+0] = (drflac_int32)left;
9169 pOutputSamples[i*2+1] = (drflac_int32)right;
9170 }
9171}
9172
9173#if defined(DRFLAC_SUPPORT_SSE2)
9174static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9175{
9176 drflac_uint64 i;
9177 drflac_uint64 frameCount4 = frameCount >> 2;
9178 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9179 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9180 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9181 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9182
9183 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9184
9185 for (i = 0; i < frameCount4; ++i) {
9186 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9187 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9188 __m128i left = _mm_add_epi32(right, side);
9189
9190 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9191 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9192 }
9193
9194 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9195 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9196 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9197 drflac_uint32 left = right + side;
9198
9199 pOutputSamples[i*2+0] = (drflac_int32)left;
9200 pOutputSamples[i*2+1] = (drflac_int32)right;
9201 }
9202}
9203#endif
9204
9205#if defined(DRFLAC_SUPPORT_NEON)
9206static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9207{
9208 drflac_uint64 i;
9209 drflac_uint64 frameCount4 = frameCount >> 2;
9210 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9211 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9212 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9213 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9214 int32x4_t shift0_4;
9215 int32x4_t shift1_4;
9216
9217 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9218
9219 shift0_4 = vdupq_n_s32(shift0);
9220 shift1_4 = vdupq_n_s32(shift1);
9221
9222 for (i = 0; i < frameCount4; ++i) {
9223 uint32x4_t side;
9224 uint32x4_t right;
9225 uint32x4_t left;
9226
9227 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9228 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9229 left = vaddq_u32(right, side);
9230
9231 drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
9232 }
9233
9234 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9235 drflac_uint32 side = pInputSamples0U32[i] << shift0;
9236 drflac_uint32 right = pInputSamples1U32[i] << shift1;
9237 drflac_uint32 left = right + side;
9238
9239 pOutputSamples[i*2+0] = (drflac_int32)left;
9240 pOutputSamples[i*2+1] = (drflac_int32)right;
9241 }
9242}
9243#endif
9244
9245static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9246{
9247#if defined(DRFLAC_SUPPORT_SSE2)
9248 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9249 drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9250 } else
9251#elif defined(DRFLAC_SUPPORT_NEON)
9252 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9253 drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9254 } else
9255#endif
9256 {
9257 /* Scalar fallback. */
9e052883 9258#if 0
9259 drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9260#else
2ff0b512 9261 drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9262#endif
2ff0b512 9263 }
9264}
9265
9266
9e052883 9267#if 0
9268static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9269{
9270 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9271 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9272 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9273
9274 mid = (mid << 1) | (side & 0x01);
9275
9276 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9277 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9278 }
9279}
9280#endif
9281
2ff0b512 9282static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9283{
9284 drflac_uint64 i;
9285 drflac_uint64 frameCount4 = frameCount >> 2;
9286 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9287 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9288 drflac_int32 shift = unusedBitsPerSample;
9289
9290 if (shift > 0) {
9291 shift -= 1;
9292 for (i = 0; i < frameCount4; ++i) {
9293 drflac_uint32 temp0L;
9294 drflac_uint32 temp1L;
9295 drflac_uint32 temp2L;
9296 drflac_uint32 temp3L;
9297 drflac_uint32 temp0R;
9298 drflac_uint32 temp1R;
9299 drflac_uint32 temp2R;
9300 drflac_uint32 temp3R;
9301
9302 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9303 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9304 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9305 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9306
9307 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9308 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9309 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9310 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9311
9312 mid0 = (mid0 << 1) | (side0 & 0x01);
9313 mid1 = (mid1 << 1) | (side1 & 0x01);
9314 mid2 = (mid2 << 1) | (side2 & 0x01);
9315 mid3 = (mid3 << 1) | (side3 & 0x01);
9316
9317 temp0L = (mid0 + side0) << shift;
9318 temp1L = (mid1 + side1) << shift;
9319 temp2L = (mid2 + side2) << shift;
9320 temp3L = (mid3 + side3) << shift;
9321
9322 temp0R = (mid0 - side0) << shift;
9323 temp1R = (mid1 - side1) << shift;
9324 temp2R = (mid2 - side2) << shift;
9325 temp3R = (mid3 - side3) << shift;
9326
9327 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9328 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9329 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9330 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9331 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9332 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9333 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9334 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9335 }
9336 } else {
9337 for (i = 0; i < frameCount4; ++i) {
9338 drflac_uint32 temp0L;
9339 drflac_uint32 temp1L;
9340 drflac_uint32 temp2L;
9341 drflac_uint32 temp3L;
9342 drflac_uint32 temp0R;
9343 drflac_uint32 temp1R;
9344 drflac_uint32 temp2R;
9345 drflac_uint32 temp3R;
9346
9347 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9348 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9349 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9350 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9351
9352 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9353 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9354 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9355 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9356
9357 mid0 = (mid0 << 1) | (side0 & 0x01);
9358 mid1 = (mid1 << 1) | (side1 & 0x01);
9359 mid2 = (mid2 << 1) | (side2 & 0x01);
9360 mid3 = (mid3 << 1) | (side3 & 0x01);
9361
9362 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
9363 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
9364 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
9365 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
9366
9367 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
9368 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
9369 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
9370 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
9371
9372 pOutputSamples[i*8+0] = (drflac_int32)temp0L;
9373 pOutputSamples[i*8+1] = (drflac_int32)temp0R;
9374 pOutputSamples[i*8+2] = (drflac_int32)temp1L;
9375 pOutputSamples[i*8+3] = (drflac_int32)temp1R;
9376 pOutputSamples[i*8+4] = (drflac_int32)temp2L;
9377 pOutputSamples[i*8+5] = (drflac_int32)temp2R;
9378 pOutputSamples[i*8+6] = (drflac_int32)temp3L;
9379 pOutputSamples[i*8+7] = (drflac_int32)temp3R;
9380 }
9381 }
9382
9383 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9384 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9385 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9386
9387 mid = (mid << 1) | (side & 0x01);
9388
9389 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
9390 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
9391 }
9392}
9393
9394#if defined(DRFLAC_SUPPORT_SSE2)
9395static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9396{
9397 drflac_uint64 i;
9398 drflac_uint64 frameCount4 = frameCount >> 2;
9399 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9400 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9401 drflac_int32 shift = unusedBitsPerSample;
9402
9403 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9404
9405 if (shift == 0) {
9406 for (i = 0; i < frameCount4; ++i) {
9407 __m128i mid;
9408 __m128i side;
9409 __m128i left;
9410 __m128i right;
9411
9412 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9413 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9414
9415 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9416
9417 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
9418 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
9419
9420 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9421 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9422 }
9423
9424 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9425 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9426 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9427
9428 mid = (mid << 1) | (side & 0x01);
9429
9430 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9431 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9432 }
9433 } else {
9434 shift -= 1;
9435 for (i = 0; i < frameCount4; ++i) {
9436 __m128i mid;
9437 __m128i side;
9438 __m128i left;
9439 __m128i right;
9440
9441 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9442 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9443
9444 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
9445
9446 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
9447 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
9448
9449 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9450 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9451 }
9452
9453 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9454 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9455 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9456
9457 mid = (mid << 1) | (side & 0x01);
9458
9459 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9460 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9461 }
9462 }
9463}
9464#endif
9465
9466#if defined(DRFLAC_SUPPORT_NEON)
9467static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9468{
9469 drflac_uint64 i;
9470 drflac_uint64 frameCount4 = frameCount >> 2;
9471 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9472 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9473 drflac_int32 shift = unusedBitsPerSample;
9474 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
9475 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
9476 uint32x4_t one4;
9477
9478 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9479
9480 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9481 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9482 one4 = vdupq_n_u32(1);
9483
9484 if (shift == 0) {
9485 for (i = 0; i < frameCount4; ++i) {
9486 uint32x4_t mid;
9487 uint32x4_t side;
9488 int32x4_t left;
9489 int32x4_t right;
9490
9491 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9492 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9493
9494 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9495
9496 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
9497 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
9498
9499 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9500 }
9501
9502 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9503 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9504 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9505
9506 mid = (mid << 1) | (side & 0x01);
9507
9508 pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
9509 pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
9510 }
9511 } else {
9512 int32x4_t shift4;
9513
9514 shift -= 1;
9515 shift4 = vdupq_n_s32(shift);
9516
9517 for (i = 0; i < frameCount4; ++i) {
9518 uint32x4_t mid;
9519 uint32x4_t side;
9520 int32x4_t left;
9521 int32x4_t right;
9522
9523 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
9524 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
9525
9526 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
9527
9528 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
9529 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
9530
9531 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9532 }
9533
9534 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9535 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9536 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9537
9538 mid = (mid << 1) | (side & 0x01);
9539
9540 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
9541 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
9542 }
9543 }
9544}
9545#endif
9546
9547static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9548{
9549#if defined(DRFLAC_SUPPORT_SSE2)
9550 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9551 drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9552 } else
9553#elif defined(DRFLAC_SUPPORT_NEON)
9554 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9555 drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9556 } else
9557#endif
9558 {
9559 /* Scalar fallback. */
9e052883 9560#if 0
9561 drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9562#else
2ff0b512 9563 drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9564#endif
2ff0b512 9565 }
9566}
9567
9568
9e052883 9569#if 0
9570static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9571{
9572 for (drflac_uint64 i = 0; i < frameCount; ++i) {
9573 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
9574 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
9575 }
9576}
9577#endif
9578
2ff0b512 9579static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9580{
9581 drflac_uint64 i;
9582 drflac_uint64 frameCount4 = frameCount >> 2;
9583 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9584 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9585 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9586 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9587
9588 for (i = 0; i < frameCount4; ++i) {
9589 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
9590 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
9591 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
9592 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
9593
9594 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
9595 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
9596 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
9597 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
9598
9599 pOutputSamples[i*8+0] = (drflac_int32)tempL0;
9600 pOutputSamples[i*8+1] = (drflac_int32)tempR0;
9601 pOutputSamples[i*8+2] = (drflac_int32)tempL1;
9602 pOutputSamples[i*8+3] = (drflac_int32)tempR1;
9603 pOutputSamples[i*8+4] = (drflac_int32)tempL2;
9604 pOutputSamples[i*8+5] = (drflac_int32)tempR2;
9605 pOutputSamples[i*8+6] = (drflac_int32)tempL3;
9606 pOutputSamples[i*8+7] = (drflac_int32)tempR3;
9607 }
9608
9609 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9610 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9611 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9612 }
9613}
9614
9615#if defined(DRFLAC_SUPPORT_SSE2)
9616static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9617{
9618 drflac_uint64 i;
9619 drflac_uint64 frameCount4 = frameCount >> 2;
9620 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9621 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9622 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9623 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9624
9625 for (i = 0; i < frameCount4; ++i) {
9626 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9627 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9628
9629 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
9630 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
9631 }
9632
9633 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9634 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9635 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9636 }
9637}
9638#endif
9639
9640#if defined(DRFLAC_SUPPORT_NEON)
9641static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9642{
9643 drflac_uint64 i;
9644 drflac_uint64 frameCount4 = frameCount >> 2;
9645 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9646 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9647 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9648 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9649
9650 int32x4_t shift4_0 = vdupq_n_s32(shift0);
9651 int32x4_t shift4_1 = vdupq_n_s32(shift1);
9652
9653 for (i = 0; i < frameCount4; ++i) {
9654 int32x4_t left;
9655 int32x4_t right;
9656
9657 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
9658 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
9659
9660 drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
9661 }
9662
9663 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9664 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
9665 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
9666 }
9667}
9668#endif
9669
9670static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
9671{
9672#if defined(DRFLAC_SUPPORT_SSE2)
9673 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9674 drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9675 } else
9676#elif defined(DRFLAC_SUPPORT_NEON)
9677 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9678 drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9679 } else
9680#endif
9681 {
9682 /* Scalar fallback. */
9e052883 9683#if 0
9684 drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9685#else
2ff0b512 9686 drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9687#endif
2ff0b512 9688 }
9689}
9690
9691
9692DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
9693{
9694 drflac_uint64 framesRead;
9695 drflac_uint32 unusedBitsPerSample;
9696
9697 if (pFlac == NULL || framesToRead == 0) {
9698 return 0;
9699 }
9700
9701 if (pBufferOut == NULL) {
9702 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
9703 }
9704
9705 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
9706 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
9707
9708 framesRead = 0;
9709 while (framesToRead > 0) {
9710 /* If we've run out of samples in this frame, go to the next. */
9711 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
9712 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
9713 break; /* Couldn't read the next frame, so just break from the loop and return. */
9714 }
9715 } else {
9716 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
9717 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
9718 drflac_uint64 frameCountThisIteration = framesToRead;
9719
9720 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
9721 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
9722 }
9723
9724 if (channelCount == 2) {
9725 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
9726 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
9727
9728 switch (pFlac->currentFLACFrame.header.channelAssignment)
9729 {
9730 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
9731 {
9732 drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9733 } break;
9734
9735 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
9736 {
9737 drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9738 } break;
9739
9740 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
9741 {
9742 drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9743 } break;
9744
9745 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
9746 default:
9747 {
9748 drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
9749 } break;
9750 }
9751 } else {
9752 /* Generic interleaving. */
9753 drflac_uint64 i;
9754 for (i = 0; i < frameCountThisIteration; ++i) {
9755 unsigned int j;
9756 for (j = 0; j < channelCount; ++j) {
9757 pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
9758 }
9759 }
9760 }
9761
9762 framesRead += frameCountThisIteration;
9763 pBufferOut += frameCountThisIteration * channelCount;
9764 framesToRead -= frameCountThisIteration;
9765 pFlac->currentPCMFrame += frameCountThisIteration;
9766 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
9767 }
9768 }
9769
9770 return framesRead;
9771}
9772
9e052883 9773
9774#if 0
9775static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9776{
9777 drflac_uint64 i;
9778 for (i = 0; i < frameCount; ++i) {
9779 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9780 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9781 drflac_uint32 right = left - side;
9782
9783 left >>= 16;
9784 right >>= 16;
9785
9786 pOutputSamples[i*2+0] = (drflac_int16)left;
9787 pOutputSamples[i*2+1] = (drflac_int16)right;
9788 }
9789}
9790#endif
9791
2ff0b512 9792static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9793{
9794 drflac_uint64 i;
9795 drflac_uint64 frameCount4 = frameCount >> 2;
9796 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9797 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9798 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9799 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9800
9801 for (i = 0; i < frameCount4; ++i) {
9802 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
9803 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
9804 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
9805 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
9806
9807 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
9808 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
9809 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
9810 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
9811
9812 drflac_uint32 right0 = left0 - side0;
9813 drflac_uint32 right1 = left1 - side1;
9814 drflac_uint32 right2 = left2 - side2;
9815 drflac_uint32 right3 = left3 - side3;
9816
9817 left0 >>= 16;
9818 left1 >>= 16;
9819 left2 >>= 16;
9820 left3 >>= 16;
9821
9822 right0 >>= 16;
9823 right1 >>= 16;
9824 right2 >>= 16;
9825 right3 >>= 16;
9826
9827 pOutputSamples[i*8+0] = (drflac_int16)left0;
9828 pOutputSamples[i*8+1] = (drflac_int16)right0;
9829 pOutputSamples[i*8+2] = (drflac_int16)left1;
9830 pOutputSamples[i*8+3] = (drflac_int16)right1;
9831 pOutputSamples[i*8+4] = (drflac_int16)left2;
9832 pOutputSamples[i*8+5] = (drflac_int16)right2;
9833 pOutputSamples[i*8+6] = (drflac_int16)left3;
9834 pOutputSamples[i*8+7] = (drflac_int16)right3;
9835 }
9836
9837 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9838 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9839 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9840 drflac_uint32 right = left - side;
9841
9842 left >>= 16;
9843 right >>= 16;
9844
9845 pOutputSamples[i*2+0] = (drflac_int16)left;
9846 pOutputSamples[i*2+1] = (drflac_int16)right;
9847 }
9848}
9849
9850#if defined(DRFLAC_SUPPORT_SSE2)
9851static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9852{
9853 drflac_uint64 i;
9854 drflac_uint64 frameCount4 = frameCount >> 2;
9855 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9856 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9857 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9858 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9859
9860 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9861
9862 for (i = 0; i < frameCount4; ++i) {
9863 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
9864 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
9865 __m128i right = _mm_sub_epi32(left, side);
9866
9867 left = _mm_srai_epi32(left, 16);
9868 right = _mm_srai_epi32(right, 16);
9869
9870 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
9871 }
9872
9873 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9874 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9875 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9876 drflac_uint32 right = left - side;
9877
9878 left >>= 16;
9879 right >>= 16;
9880
9881 pOutputSamples[i*2+0] = (drflac_int16)left;
9882 pOutputSamples[i*2+1] = (drflac_int16)right;
9883 }
9884}
9885#endif
9886
9887#if defined(DRFLAC_SUPPORT_NEON)
9888static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9889{
9890 drflac_uint64 i;
9891 drflac_uint64 frameCount4 = frameCount >> 2;
9892 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9893 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9894 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9895 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9896 int32x4_t shift0_4;
9897 int32x4_t shift1_4;
9898
9899 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
9900
9901 shift0_4 = vdupq_n_s32(shift0);
9902 shift1_4 = vdupq_n_s32(shift1);
9903
9904 for (i = 0; i < frameCount4; ++i) {
9905 uint32x4_t left;
9906 uint32x4_t side;
9907 uint32x4_t right;
9908
9909 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
9910 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
9911 right = vsubq_u32(left, side);
9912
9913 left = vshrq_n_u32(left, 16);
9914 right = vshrq_n_u32(right, 16);
9915
9916 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
9917 }
9918
9919 for (i = (frameCount4 << 2); i < frameCount; ++i) {
9920 drflac_uint32 left = pInputSamples0U32[i] << shift0;
9921 drflac_uint32 side = pInputSamples1U32[i] << shift1;
9922 drflac_uint32 right = left - side;
9923
9924 left >>= 16;
9925 right >>= 16;
9926
9927 pOutputSamples[i*2+0] = (drflac_int16)left;
9928 pOutputSamples[i*2+1] = (drflac_int16)right;
9929 }
9930}
9931#endif
9932
9933static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9934{
9935#if defined(DRFLAC_SUPPORT_SSE2)
9936 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
9937 drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9938 } else
9939#elif defined(DRFLAC_SUPPORT_NEON)
9940 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
9941 drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9942 } else
9943#endif
9944 {
9945 /* Scalar fallback. */
9e052883 9946#if 0
9947 drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9948#else
2ff0b512 9949 drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 9950#endif
2ff0b512 9951 }
9952}
9953
9954
9e052883 9955#if 0
9956static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9957{
9958 drflac_uint64 i;
9959 for (i = 0; i < frameCount; ++i) {
9960 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
9961 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
9962 drflac_uint32 left = right + side;
9963
9964 left >>= 16;
9965 right >>= 16;
9966
9967 pOutputSamples[i*2+0] = (drflac_int16)left;
9968 pOutputSamples[i*2+1] = (drflac_int16)right;
9969 }
9970}
9971#endif
9972
2ff0b512 9973static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
9974{
9975 drflac_uint64 i;
9976 drflac_uint64 frameCount4 = frameCount >> 2;
9977 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
9978 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
9979 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
9980 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
9981
9982 for (i = 0; i < frameCount4; ++i) {
9983 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
9984 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
9985 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
9986 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
9987
9988 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
9989 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
9990 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
9991 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
9992
9993 drflac_uint32 left0 = right0 + side0;
9994 drflac_uint32 left1 = right1 + side1;
9995 drflac_uint32 left2 = right2 + side2;
9996 drflac_uint32 left3 = right3 + side3;
9997
9998 left0 >>= 16;
9999 left1 >>= 16;
10000 left2 >>= 16;
10001 left3 >>= 16;
10002
10003 right0 >>= 16;
10004 right1 >>= 16;
10005 right2 >>= 16;
10006 right3 >>= 16;
10007
10008 pOutputSamples[i*8+0] = (drflac_int16)left0;
10009 pOutputSamples[i*8+1] = (drflac_int16)right0;
10010 pOutputSamples[i*8+2] = (drflac_int16)left1;
10011 pOutputSamples[i*8+3] = (drflac_int16)right1;
10012 pOutputSamples[i*8+4] = (drflac_int16)left2;
10013 pOutputSamples[i*8+5] = (drflac_int16)right2;
10014 pOutputSamples[i*8+6] = (drflac_int16)left3;
10015 pOutputSamples[i*8+7] = (drflac_int16)right3;
10016 }
10017
10018 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10019 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10020 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10021 drflac_uint32 left = right + side;
10022
10023 left >>= 16;
10024 right >>= 16;
10025
10026 pOutputSamples[i*2+0] = (drflac_int16)left;
10027 pOutputSamples[i*2+1] = (drflac_int16)right;
10028 }
10029}
10030
10031#if defined(DRFLAC_SUPPORT_SSE2)
10032static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10033{
10034 drflac_uint64 i;
10035 drflac_uint64 frameCount4 = frameCount >> 2;
10036 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10037 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10038 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10039 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10040
10041 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10042
10043 for (i = 0; i < frameCount4; ++i) {
10044 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10045 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10046 __m128i left = _mm_add_epi32(right, side);
10047
10048 left = _mm_srai_epi32(left, 16);
10049 right = _mm_srai_epi32(right, 16);
10050
10051 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10052 }
10053
10054 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10055 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10056 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10057 drflac_uint32 left = right + side;
10058
10059 left >>= 16;
10060 right >>= 16;
10061
10062 pOutputSamples[i*2+0] = (drflac_int16)left;
10063 pOutputSamples[i*2+1] = (drflac_int16)right;
10064 }
10065}
10066#endif
10067
10068#if defined(DRFLAC_SUPPORT_NEON)
10069static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10070{
10071 drflac_uint64 i;
10072 drflac_uint64 frameCount4 = frameCount >> 2;
10073 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10074 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10075 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10076 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10077 int32x4_t shift0_4;
10078 int32x4_t shift1_4;
10079
10080 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10081
10082 shift0_4 = vdupq_n_s32(shift0);
10083 shift1_4 = vdupq_n_s32(shift1);
10084
10085 for (i = 0; i < frameCount4; ++i) {
10086 uint32x4_t side;
10087 uint32x4_t right;
10088 uint32x4_t left;
10089
10090 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10091 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10092 left = vaddq_u32(right, side);
10093
10094 left = vshrq_n_u32(left, 16);
10095 right = vshrq_n_u32(right, 16);
10096
10097 drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
10098 }
10099
10100 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10101 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10102 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10103 drflac_uint32 left = right + side;
10104
10105 left >>= 16;
10106 right >>= 16;
10107
10108 pOutputSamples[i*2+0] = (drflac_int16)left;
10109 pOutputSamples[i*2+1] = (drflac_int16)right;
10110 }
10111}
10112#endif
10113
10114static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10115{
10116#if defined(DRFLAC_SUPPORT_SSE2)
10117 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10118 drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10119 } else
10120#elif defined(DRFLAC_SUPPORT_NEON)
10121 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10122 drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10123 } else
10124#endif
10125 {
10126 /* Scalar fallback. */
9e052883 10127#if 0
10128 drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10129#else
2ff0b512 10130 drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 10131#endif
2ff0b512 10132 }
10133}
10134
10135
9e052883 10136#if 0
10137static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10138{
10139 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10140 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10141 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10142
10143 mid = (mid << 1) | (side & 0x01);
10144
10145 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10146 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10147 }
10148}
10149#endif
2ff0b512 10150
10151static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10152{
10153 drflac_uint64 i;
10154 drflac_uint64 frameCount4 = frameCount >> 2;
10155 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10156 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10157 drflac_uint32 shift = unusedBitsPerSample;
10158
10159 if (shift > 0) {
10160 shift -= 1;
10161 for (i = 0; i < frameCount4; ++i) {
10162 drflac_uint32 temp0L;
10163 drflac_uint32 temp1L;
10164 drflac_uint32 temp2L;
10165 drflac_uint32 temp3L;
10166 drflac_uint32 temp0R;
10167 drflac_uint32 temp1R;
10168 drflac_uint32 temp2R;
10169 drflac_uint32 temp3R;
10170
10171 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10172 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10173 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10174 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10175
10176 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10177 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10178 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10179 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10180
10181 mid0 = (mid0 << 1) | (side0 & 0x01);
10182 mid1 = (mid1 << 1) | (side1 & 0x01);
10183 mid2 = (mid2 << 1) | (side2 & 0x01);
10184 mid3 = (mid3 << 1) | (side3 & 0x01);
10185
10186 temp0L = (mid0 + side0) << shift;
10187 temp1L = (mid1 + side1) << shift;
10188 temp2L = (mid2 + side2) << shift;
10189 temp3L = (mid3 + side3) << shift;
10190
10191 temp0R = (mid0 - side0) << shift;
10192 temp1R = (mid1 - side1) << shift;
10193 temp2R = (mid2 - side2) << shift;
10194 temp3R = (mid3 - side3) << shift;
10195
10196 temp0L >>= 16;
10197 temp1L >>= 16;
10198 temp2L >>= 16;
10199 temp3L >>= 16;
10200
10201 temp0R >>= 16;
10202 temp1R >>= 16;
10203 temp2R >>= 16;
10204 temp3R >>= 16;
10205
10206 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10207 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10208 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10209 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10210 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10211 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10212 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10213 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10214 }
10215 } else {
10216 for (i = 0; i < frameCount4; ++i) {
10217 drflac_uint32 temp0L;
10218 drflac_uint32 temp1L;
10219 drflac_uint32 temp2L;
10220 drflac_uint32 temp3L;
10221 drflac_uint32 temp0R;
10222 drflac_uint32 temp1R;
10223 drflac_uint32 temp2R;
10224 drflac_uint32 temp3R;
10225
10226 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10227 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10228 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10229 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10230
10231 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10232 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10233 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10234 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10235
10236 mid0 = (mid0 << 1) | (side0 & 0x01);
10237 mid1 = (mid1 << 1) | (side1 & 0x01);
10238 mid2 = (mid2 << 1) | (side2 & 0x01);
10239 mid3 = (mid3 << 1) | (side3 & 0x01);
10240
10241 temp0L = ((drflac_int32)(mid0 + side0) >> 1);
10242 temp1L = ((drflac_int32)(mid1 + side1) >> 1);
10243 temp2L = ((drflac_int32)(mid2 + side2) >> 1);
10244 temp3L = ((drflac_int32)(mid3 + side3) >> 1);
10245
10246 temp0R = ((drflac_int32)(mid0 - side0) >> 1);
10247 temp1R = ((drflac_int32)(mid1 - side1) >> 1);
10248 temp2R = ((drflac_int32)(mid2 - side2) >> 1);
10249 temp3R = ((drflac_int32)(mid3 - side3) >> 1);
10250
10251 temp0L >>= 16;
10252 temp1L >>= 16;
10253 temp2L >>= 16;
10254 temp3L >>= 16;
10255
10256 temp0R >>= 16;
10257 temp1R >>= 16;
10258 temp2R >>= 16;
10259 temp3R >>= 16;
10260
10261 pOutputSamples[i*8+0] = (drflac_int16)temp0L;
10262 pOutputSamples[i*8+1] = (drflac_int16)temp0R;
10263 pOutputSamples[i*8+2] = (drflac_int16)temp1L;
10264 pOutputSamples[i*8+3] = (drflac_int16)temp1R;
10265 pOutputSamples[i*8+4] = (drflac_int16)temp2L;
10266 pOutputSamples[i*8+5] = (drflac_int16)temp2R;
10267 pOutputSamples[i*8+6] = (drflac_int16)temp3L;
10268 pOutputSamples[i*8+7] = (drflac_int16)temp3R;
10269 }
10270 }
10271
10272 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10273 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10274 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10275
10276 mid = (mid << 1) | (side & 0x01);
10277
10278 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
10279 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
10280 }
10281}
10282
10283#if defined(DRFLAC_SUPPORT_SSE2)
10284static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10285{
10286 drflac_uint64 i;
10287 drflac_uint64 frameCount4 = frameCount >> 2;
10288 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10289 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10290 drflac_uint32 shift = unusedBitsPerSample;
10291
10292 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10293
10294 if (shift == 0) {
10295 for (i = 0; i < frameCount4; ++i) {
10296 __m128i mid;
10297 __m128i side;
10298 __m128i left;
10299 __m128i right;
10300
10301 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10302 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10303
10304 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10305
10306 left = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
10307 right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
10308
10309 left = _mm_srai_epi32(left, 16);
10310 right = _mm_srai_epi32(right, 16);
10311
10312 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10313 }
10314
10315 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10316 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10317 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10318
10319 mid = (mid << 1) | (side & 0x01);
10320
10321 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10322 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10323 }
10324 } else {
10325 shift -= 1;
10326 for (i = 0; i < frameCount4; ++i) {
10327 __m128i mid;
10328 __m128i side;
10329 __m128i left;
10330 __m128i right;
10331
10332 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10333 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10334
10335 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
10336
10337 left = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
10338 right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
10339
10340 left = _mm_srai_epi32(left, 16);
10341 right = _mm_srai_epi32(right, 16);
10342
10343 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10344 }
10345
10346 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10347 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10348 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10349
10350 mid = (mid << 1) | (side & 0x01);
10351
10352 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10353 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10354 }
10355 }
10356}
10357#endif
10358
10359#if defined(DRFLAC_SUPPORT_NEON)
10360static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10361{
10362 drflac_uint64 i;
10363 drflac_uint64 frameCount4 = frameCount >> 2;
10364 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10365 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10366 drflac_uint32 shift = unusedBitsPerSample;
10367 int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
10368 int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
10369
10370 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10371
10372 wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10373 wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10374
10375 if (shift == 0) {
10376 for (i = 0; i < frameCount4; ++i) {
10377 uint32x4_t mid;
10378 uint32x4_t side;
10379 int32x4_t left;
10380 int32x4_t right;
10381
10382 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10383 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10384
10385 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10386
10387 left = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
10388 right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
10389
10390 left = vshrq_n_s32(left, 16);
10391 right = vshrq_n_s32(right, 16);
10392
10393 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10394 }
10395
10396 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10397 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10398 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10399
10400 mid = (mid << 1) | (side & 0x01);
10401
10402 pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
10403 pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
10404 }
10405 } else {
10406 int32x4_t shift4;
10407
10408 shift -= 1;
10409 shift4 = vdupq_n_s32(shift);
10410
10411 for (i = 0; i < frameCount4; ++i) {
10412 uint32x4_t mid;
10413 uint32x4_t side;
10414 int32x4_t left;
10415 int32x4_t right;
10416
10417 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
10418 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
10419
10420 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
10421
10422 left = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
10423 right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
10424
10425 left = vshrq_n_s32(left, 16);
10426 right = vshrq_n_s32(right, 16);
10427
10428 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10429 }
10430
10431 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10432 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10433 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10434
10435 mid = (mid << 1) | (side & 0x01);
10436
10437 pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
10438 pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
10439 }
10440 }
10441}
10442#endif
10443
10444static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10445{
10446#if defined(DRFLAC_SUPPORT_SSE2)
10447 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10448 drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10449 } else
10450#elif defined(DRFLAC_SUPPORT_NEON)
10451 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10452 drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10453 } else
10454#endif
10455 {
10456 /* Scalar fallback. */
9e052883 10457#if 0
10458 drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10459#else
2ff0b512 10460 drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 10461#endif
10462 }
10463}
10464
10465
10466#if 0
10467static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10468{
10469 for (drflac_uint64 i = 0; i < frameCount; ++i) {
10470 pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
10471 pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
2ff0b512 10472 }
10473}
9e052883 10474#endif
2ff0b512 10475
10476static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10477{
10478 drflac_uint64 i;
10479 drflac_uint64 frameCount4 = frameCount >> 2;
10480 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10481 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10482 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10483 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10484
10485 for (i = 0; i < frameCount4; ++i) {
10486 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
10487 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
10488 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
10489 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
10490
10491 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
10492 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
10493 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
10494 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
10495
10496 tempL0 >>= 16;
10497 tempL1 >>= 16;
10498 tempL2 >>= 16;
10499 tempL3 >>= 16;
10500
10501 tempR0 >>= 16;
10502 tempR1 >>= 16;
10503 tempR2 >>= 16;
10504 tempR3 >>= 16;
10505
10506 pOutputSamples[i*8+0] = (drflac_int16)tempL0;
10507 pOutputSamples[i*8+1] = (drflac_int16)tempR0;
10508 pOutputSamples[i*8+2] = (drflac_int16)tempL1;
10509 pOutputSamples[i*8+3] = (drflac_int16)tempR1;
10510 pOutputSamples[i*8+4] = (drflac_int16)tempL2;
10511 pOutputSamples[i*8+5] = (drflac_int16)tempR2;
10512 pOutputSamples[i*8+6] = (drflac_int16)tempL3;
10513 pOutputSamples[i*8+7] = (drflac_int16)tempR3;
10514 }
10515
10516 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10517 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10518 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10519 }
10520}
10521
10522#if defined(DRFLAC_SUPPORT_SSE2)
10523static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10524{
10525 drflac_uint64 i;
10526 drflac_uint64 frameCount4 = frameCount >> 2;
10527 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10528 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10529 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10530 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10531
10532 for (i = 0; i < frameCount4; ++i) {
10533 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10534 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10535
10536 left = _mm_srai_epi32(left, 16);
10537 right = _mm_srai_epi32(right, 16);
10538
10539 /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
10540 _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
10541 }
10542
10543 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10544 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10545 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10546 }
10547}
10548#endif
10549
10550#if defined(DRFLAC_SUPPORT_NEON)
10551static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10552{
10553 drflac_uint64 i;
10554 drflac_uint64 frameCount4 = frameCount >> 2;
10555 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10556 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10557 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10558 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10559
10560 int32x4_t shift0_4 = vdupq_n_s32(shift0);
10561 int32x4_t shift1_4 = vdupq_n_s32(shift1);
10562
10563 for (i = 0; i < frameCount4; ++i) {
10564 int32x4_t left;
10565 int32x4_t right;
10566
10567 left = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
10568 right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
10569
10570 left = vshrq_n_s32(left, 16);
10571 right = vshrq_n_s32(right, 16);
10572
10573 drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
10574 }
10575
10576 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10577 pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
10578 pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
10579 }
10580}
10581#endif
10582
10583static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
10584{
10585#if defined(DRFLAC_SUPPORT_SSE2)
10586 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10587 drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10588 } else
10589#elif defined(DRFLAC_SUPPORT_NEON)
10590 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10591 drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10592 } else
10593#endif
10594 {
10595 /* Scalar fallback. */
9e052883 10596#if 0
10597 drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10598#else
2ff0b512 10599 drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 10600#endif
2ff0b512 10601 }
10602}
10603
10604DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
10605{
10606 drflac_uint64 framesRead;
10607 drflac_uint32 unusedBitsPerSample;
10608
10609 if (pFlac == NULL || framesToRead == 0) {
10610 return 0;
10611 }
10612
10613 if (pBufferOut == NULL) {
10614 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
10615 }
10616
10617 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
10618 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
10619
10620 framesRead = 0;
10621 while (framesToRead > 0) {
10622 /* If we've run out of samples in this frame, go to the next. */
10623 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
10624 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
10625 break; /* Couldn't read the next frame, so just break from the loop and return. */
10626 }
10627 } else {
10628 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
10629 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
10630 drflac_uint64 frameCountThisIteration = framesToRead;
10631
10632 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
10633 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
10634 }
10635
10636 if (channelCount == 2) {
10637 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
10638 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
10639
10640 switch (pFlac->currentFLACFrame.header.channelAssignment)
10641 {
10642 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
10643 {
10644 drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10645 } break;
10646
10647 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
10648 {
10649 drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10650 } break;
10651
10652 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
10653 {
10654 drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10655 } break;
10656
10657 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
10658 default:
10659 {
10660 drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
10661 } break;
10662 }
10663 } else {
10664 /* Generic interleaving. */
10665 drflac_uint64 i;
10666 for (i = 0; i < frameCountThisIteration; ++i) {
10667 unsigned int j;
10668 for (j = 0; j < channelCount; ++j) {
10669 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
10670 pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
10671 }
10672 }
10673 }
10674
10675 framesRead += frameCountThisIteration;
10676 pBufferOut += frameCountThisIteration * channelCount;
10677 framesToRead -= frameCountThisIteration;
10678 pFlac->currentPCMFrame += frameCountThisIteration;
10679 pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
10680 }
10681 }
10682
10683 return framesRead;
10684}
10685
9e052883 10686
10687#if 0
10688static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10689{
10690 drflac_uint64 i;
10691 for (i = 0; i < frameCount; ++i) {
10692 drflac_uint32 left = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10693 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10694 drflac_uint32 right = left - side;
10695
10696 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10697 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10698 }
10699}
10700#endif
10701
2ff0b512 10702static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10703{
10704 drflac_uint64 i;
10705 drflac_uint64 frameCount4 = frameCount >> 2;
10706 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10707 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10708 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10709 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10710
10711 float factor = 1 / 2147483648.0;
10712
10713 for (i = 0; i < frameCount4; ++i) {
10714 drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
10715 drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
10716 drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
10717 drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
10718
10719 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
10720 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
10721 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
10722 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
10723
10724 drflac_uint32 right0 = left0 - side0;
10725 drflac_uint32 right1 = left1 - side1;
10726 drflac_uint32 right2 = left2 - side2;
10727 drflac_uint32 right3 = left3 - side3;
10728
10729 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10730 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10731 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10732 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10733 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10734 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10735 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10736 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10737 }
10738
10739 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10740 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10741 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10742 drflac_uint32 right = left - side;
10743
10744 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10745 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10746 }
10747}
10748
10749#if defined(DRFLAC_SUPPORT_SSE2)
10750static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10751{
10752 drflac_uint64 i;
10753 drflac_uint64 frameCount4 = frameCount >> 2;
10754 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10755 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10756 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10757 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10758 __m128 factor;
10759
10760 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10761
10762 factor = _mm_set1_ps(1.0f / 8388608.0f);
10763
10764 for (i = 0; i < frameCount4; ++i) {
10765 __m128i left = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10766 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10767 __m128i right = _mm_sub_epi32(left, side);
10768 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10769 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10770
10771 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10772 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10773 }
10774
10775 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10776 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10777 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10778 drflac_uint32 right = left - side;
10779
10780 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10781 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10782 }
10783}
10784#endif
10785
10786#if defined(DRFLAC_SUPPORT_NEON)
10787static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10788{
10789 drflac_uint64 i;
10790 drflac_uint64 frameCount4 = frameCount >> 2;
10791 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10792 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10793 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10794 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10795 float32x4_t factor4;
10796 int32x4_t shift0_4;
10797 int32x4_t shift1_4;
10798
10799 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10800
10801 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10802 shift0_4 = vdupq_n_s32(shift0);
10803 shift1_4 = vdupq_n_s32(shift1);
10804
10805 for (i = 0; i < frameCount4; ++i) {
10806 uint32x4_t left;
10807 uint32x4_t side;
10808 uint32x4_t right;
10809 float32x4_t leftf;
10810 float32x4_t rightf;
10811
10812 left = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10813 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10814 right = vsubq_u32(left, side);
10815 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10816 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10817
10818 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10819 }
10820
10821 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10822 drflac_uint32 left = pInputSamples0U32[i] << shift0;
10823 drflac_uint32 side = pInputSamples1U32[i] << shift1;
10824 drflac_uint32 right = left - side;
10825
10826 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10827 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10828 }
10829}
10830#endif
10831
10832static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10833{
10834#if defined(DRFLAC_SUPPORT_SSE2)
10835 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
10836 drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10837 } else
10838#elif defined(DRFLAC_SUPPORT_NEON)
10839 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
10840 drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10841 } else
10842#endif
10843 {
10844 /* Scalar fallback. */
9e052883 10845#if 0
10846 drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
10847#else
2ff0b512 10848 drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 10849#endif
2ff0b512 10850 }
10851}
10852
10853
9e052883 10854#if 0
10855static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10856{
10857 drflac_uint64 i;
10858 for (i = 0; i < frameCount; ++i) {
10859 drflac_uint32 side = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
10860 drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
10861 drflac_uint32 left = right + side;
10862
10863 pOutputSamples[i*2+0] = (float)((drflac_int32)left / 2147483648.0);
10864 pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
10865 }
10866}
10867#endif
10868
2ff0b512 10869static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10870{
10871 drflac_uint64 i;
10872 drflac_uint64 frameCount4 = frameCount >> 2;
10873 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10874 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10875 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
10876 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
10877 float factor = 1 / 2147483648.0;
10878
10879 for (i = 0; i < frameCount4; ++i) {
10880 drflac_uint32 side0 = pInputSamples0U32[i*4+0] << shift0;
10881 drflac_uint32 side1 = pInputSamples0U32[i*4+1] << shift0;
10882 drflac_uint32 side2 = pInputSamples0U32[i*4+2] << shift0;
10883 drflac_uint32 side3 = pInputSamples0U32[i*4+3] << shift0;
10884
10885 drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
10886 drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
10887 drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
10888 drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
10889
10890 drflac_uint32 left0 = right0 + side0;
10891 drflac_uint32 left1 = right1 + side1;
10892 drflac_uint32 left2 = right2 + side2;
10893 drflac_uint32 left3 = right3 + side3;
10894
10895 pOutputSamples[i*8+0] = (drflac_int32)left0 * factor;
10896 pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
10897 pOutputSamples[i*8+2] = (drflac_int32)left1 * factor;
10898 pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
10899 pOutputSamples[i*8+4] = (drflac_int32)left2 * factor;
10900 pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
10901 pOutputSamples[i*8+6] = (drflac_int32)left3 * factor;
10902 pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
10903 }
10904
10905 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10906 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10907 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10908 drflac_uint32 left = right + side;
10909
10910 pOutputSamples[i*2+0] = (drflac_int32)left * factor;
10911 pOutputSamples[i*2+1] = (drflac_int32)right * factor;
10912 }
10913}
10914
10915#if defined(DRFLAC_SUPPORT_SSE2)
10916static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10917{
10918 drflac_uint64 i;
10919 drflac_uint64 frameCount4 = frameCount >> 2;
10920 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10921 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10922 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10923 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10924 __m128 factor;
10925
10926 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10927
10928 factor = _mm_set1_ps(1.0f / 8388608.0f);
10929
10930 for (i = 0; i < frameCount4; ++i) {
10931 __m128i side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
10932 __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
10933 __m128i left = _mm_add_epi32(right, side);
10934 __m128 leftf = _mm_mul_ps(_mm_cvtepi32_ps(left), factor);
10935 __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
10936
10937 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
10938 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
10939 }
10940
10941 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10942 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10943 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10944 drflac_uint32 left = right + side;
10945
10946 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10947 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10948 }
10949}
10950#endif
10951
10952#if defined(DRFLAC_SUPPORT_NEON)
10953static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10954{
10955 drflac_uint64 i;
10956 drflac_uint64 frameCount4 = frameCount >> 2;
10957 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
10958 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
10959 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
10960 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
10961 float32x4_t factor4;
10962 int32x4_t shift0_4;
10963 int32x4_t shift1_4;
10964
10965 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
10966
10967 factor4 = vdupq_n_f32(1.0f / 8388608.0f);
10968 shift0_4 = vdupq_n_s32(shift0);
10969 shift1_4 = vdupq_n_s32(shift1);
10970
10971 for (i = 0; i < frameCount4; ++i) {
10972 uint32x4_t side;
10973 uint32x4_t right;
10974 uint32x4_t left;
10975 float32x4_t leftf;
10976 float32x4_t rightf;
10977
10978 side = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
10979 right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
10980 left = vaddq_u32(right, side);
10981 leftf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)), factor4);
10982 rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
10983
10984 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
10985 }
10986
10987 for (i = (frameCount4 << 2); i < frameCount; ++i) {
10988 drflac_uint32 side = pInputSamples0U32[i] << shift0;
10989 drflac_uint32 right = pInputSamples1U32[i] << shift1;
10990 drflac_uint32 left = right + side;
10991
10992 pOutputSamples[i*2+0] = (drflac_int32)left / 8388608.0f;
10993 pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
10994 }
10995}
10996#endif
10997
10998static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
10999{
11000#if defined(DRFLAC_SUPPORT_SSE2)
11001 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11002 drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11003 } else
11004#elif defined(DRFLAC_SUPPORT_NEON)
11005 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11006 drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11007 } else
11008#endif
11009 {
11010 /* Scalar fallback. */
9e052883 11011#if 0
11012 drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11013#else
2ff0b512 11014 drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 11015#endif
11016 }
11017}
11018
11019
11020#if 0
11021static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11022{
11023 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11024 drflac_uint32 mid = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11025 drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11026
11027 mid = (mid << 1) | (side & 0x01);
11028
11029 pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
11030 pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
2ff0b512 11031 }
11032}
9e052883 11033#endif
2ff0b512 11034
11035static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11036{
11037 drflac_uint64 i;
11038 drflac_uint64 frameCount4 = frameCount >> 2;
11039 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11040 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11041 drflac_uint32 shift = unusedBitsPerSample;
11042 float factor = 1 / 2147483648.0;
11043
11044 if (shift > 0) {
11045 shift -= 1;
11046 for (i = 0; i < frameCount4; ++i) {
11047 drflac_uint32 temp0L;
11048 drflac_uint32 temp1L;
11049 drflac_uint32 temp2L;
11050 drflac_uint32 temp3L;
11051 drflac_uint32 temp0R;
11052 drflac_uint32 temp1R;
11053 drflac_uint32 temp2R;
11054 drflac_uint32 temp3R;
11055
11056 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11057 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11058 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11059 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11060
11061 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11062 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11063 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11064 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11065
11066 mid0 = (mid0 << 1) | (side0 & 0x01);
11067 mid1 = (mid1 << 1) | (side1 & 0x01);
11068 mid2 = (mid2 << 1) | (side2 & 0x01);
11069 mid3 = (mid3 << 1) | (side3 & 0x01);
11070
11071 temp0L = (mid0 + side0) << shift;
11072 temp1L = (mid1 + side1) << shift;
11073 temp2L = (mid2 + side2) << shift;
11074 temp3L = (mid3 + side3) << shift;
11075
11076 temp0R = (mid0 - side0) << shift;
11077 temp1R = (mid1 - side1) << shift;
11078 temp2R = (mid2 - side2) << shift;
11079 temp3R = (mid3 - side3) << shift;
11080
11081 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11082 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11083 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11084 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11085 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11086 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11087 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11088 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11089 }
11090 } else {
11091 for (i = 0; i < frameCount4; ++i) {
11092 drflac_uint32 temp0L;
11093 drflac_uint32 temp1L;
11094 drflac_uint32 temp2L;
11095 drflac_uint32 temp3L;
11096 drflac_uint32 temp0R;
11097 drflac_uint32 temp1R;
11098 drflac_uint32 temp2R;
11099 drflac_uint32 temp3R;
11100
11101 drflac_uint32 mid0 = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11102 drflac_uint32 mid1 = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11103 drflac_uint32 mid2 = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11104 drflac_uint32 mid3 = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11105
11106 drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11107 drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11108 drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11109 drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11110
11111 mid0 = (mid0 << 1) | (side0 & 0x01);
11112 mid1 = (mid1 << 1) | (side1 & 0x01);
11113 mid2 = (mid2 << 1) | (side2 & 0x01);
11114 mid3 = (mid3 << 1) | (side3 & 0x01);
11115
11116 temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
11117 temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
11118 temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
11119 temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
11120
11121 temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
11122 temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
11123 temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
11124 temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
11125
11126 pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
11127 pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
11128 pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
11129 pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
11130 pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
11131 pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
11132 pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
11133 pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
11134 }
11135 }
11136
11137 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11138 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11139 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11140
11141 mid = (mid << 1) | (side & 0x01);
11142
11143 pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
11144 pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
11145 }
11146}
11147
11148#if defined(DRFLAC_SUPPORT_SSE2)
11149static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11150{
11151 drflac_uint64 i;
11152 drflac_uint64 frameCount4 = frameCount >> 2;
11153 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11154 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11155 drflac_uint32 shift = unusedBitsPerSample - 8;
11156 float factor;
11157 __m128 factor128;
11158
11159 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11160
11161 factor = 1.0f / 8388608.0f;
11162 factor128 = _mm_set1_ps(factor);
11163
11164 if (shift == 0) {
11165 for (i = 0; i < frameCount4; ++i) {
11166 __m128i mid;
11167 __m128i side;
11168 __m128i tempL;
11169 __m128i tempR;
11170 __m128 leftf;
11171 __m128 rightf;
11172
11173 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11174 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11175
11176 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11177
11178 tempL = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
11179 tempR = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
11180
11181 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11182 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11183
11184 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11185 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11186 }
11187
11188 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11189 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11190 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11191
11192 mid = (mid << 1) | (side & 0x01);
11193
11194 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11195 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11196 }
11197 } else {
11198 shift -= 1;
11199 for (i = 0; i < frameCount4; ++i) {
11200 __m128i mid;
11201 __m128i side;
11202 __m128i tempL;
11203 __m128i tempR;
11204 __m128 leftf;
11205 __m128 rightf;
11206
11207 mid = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11208 side = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11209
11210 mid = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
11211
11212 tempL = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
11213 tempR = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
11214
11215 leftf = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
11216 rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
11217
11218 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11219 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11220 }
11221
11222 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11223 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11224 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11225
11226 mid = (mid << 1) | (side & 0x01);
11227
11228 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11229 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11230 }
11231 }
11232}
11233#endif
11234
11235#if defined(DRFLAC_SUPPORT_NEON)
11236static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11237{
11238 drflac_uint64 i;
11239 drflac_uint64 frameCount4 = frameCount >> 2;
11240 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11241 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11242 drflac_uint32 shift = unusedBitsPerSample - 8;
11243 float factor;
11244 float32x4_t factor4;
11245 int32x4_t shift4;
11246 int32x4_t wbps0_4; /* Wasted Bits Per Sample */
11247 int32x4_t wbps1_4; /* Wasted Bits Per Sample */
11248
11249 DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
11250
11251 factor = 1.0f / 8388608.0f;
11252 factor4 = vdupq_n_f32(factor);
11253 wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
11254 wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
11255
11256 if (shift == 0) {
11257 for (i = 0; i < frameCount4; ++i) {
11258 int32x4_t lefti;
11259 int32x4_t righti;
11260 float32x4_t leftf;
11261 float32x4_t rightf;
11262
11263 uint32x4_t mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11264 uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11265
11266 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11267
11268 lefti = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
11269 righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
11270
11271 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11272 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11273
11274 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11275 }
11276
11277 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11278 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11279 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11280
11281 mid = (mid << 1) | (side & 0x01);
11282
11283 pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
11284 pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
11285 }
11286 } else {
11287 shift -= 1;
11288 shift4 = vdupq_n_s32(shift);
11289 for (i = 0; i < frameCount4; ++i) {
11290 uint32x4_t mid;
11291 uint32x4_t side;
11292 int32x4_t lefti;
11293 int32x4_t righti;
11294 float32x4_t leftf;
11295 float32x4_t rightf;
11296
11297 mid = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
11298 side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
11299
11300 mid = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
11301
11302 lefti = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
11303 righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
11304
11305 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11306 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11307
11308 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11309 }
11310
11311 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11312 drflac_uint32 mid = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11313 drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11314
11315 mid = (mid << 1) | (side & 0x01);
11316
11317 pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
11318 pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
11319 }
11320 }
11321}
11322#endif
11323
11324static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11325{
11326#if defined(DRFLAC_SUPPORT_SSE2)
11327 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11328 drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11329 } else
11330#elif defined(DRFLAC_SUPPORT_NEON)
11331 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11332 drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11333 } else
11334#endif
11335 {
11336 /* Scalar fallback. */
9e052883 11337#if 0
11338 drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11339#else
2ff0b512 11340 drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 11341#endif
11342 }
11343}
11344
11345#if 0
11346static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11347{
11348 for (drflac_uint64 i = 0; i < frameCount; ++i) {
11349 pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
11350 pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
2ff0b512 11351 }
11352}
9e052883 11353#endif
2ff0b512 11354
11355static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11356{
11357 drflac_uint64 i;
11358 drflac_uint64 frameCount4 = frameCount >> 2;
11359 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11360 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11361 drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
11362 drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
11363 float factor = 1 / 2147483648.0;
11364
11365 for (i = 0; i < frameCount4; ++i) {
11366 drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
11367 drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
11368 drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
11369 drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
11370
11371 drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
11372 drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
11373 drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
11374 drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
11375
11376 pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
11377 pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
11378 pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
11379 pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
11380 pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
11381 pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
11382 pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
11383 pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
11384 }
11385
11386 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11387 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11388 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11389 }
11390}
11391
11392#if defined(DRFLAC_SUPPORT_SSE2)
11393static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11394{
11395 drflac_uint64 i;
11396 drflac_uint64 frameCount4 = frameCount >> 2;
11397 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11398 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11399 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11400 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11401
11402 float factor = 1.0f / 8388608.0f;
11403 __m128 factor128 = _mm_set1_ps(factor);
11404
11405 for (i = 0; i < frameCount4; ++i) {
11406 __m128i lefti;
11407 __m128i righti;
11408 __m128 leftf;
11409 __m128 rightf;
11410
11411 lefti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
11412 righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
11413
11414 leftf = _mm_mul_ps(_mm_cvtepi32_ps(lefti), factor128);
11415 rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
11416
11417 _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
11418 _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
11419 }
11420
11421 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11422 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11423 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11424 }
11425}
11426#endif
11427
11428#if defined(DRFLAC_SUPPORT_NEON)
11429static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11430{
11431 drflac_uint64 i;
11432 drflac_uint64 frameCount4 = frameCount >> 2;
11433 const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
11434 const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
11435 drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
11436 drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
11437
11438 float factor = 1.0f / 8388608.0f;
11439 float32x4_t factor4 = vdupq_n_f32(factor);
11440 int32x4_t shift0_4 = vdupq_n_s32(shift0);
11441 int32x4_t shift1_4 = vdupq_n_s32(shift1);
11442
11443 for (i = 0; i < frameCount4; ++i) {
11444 int32x4_t lefti;
11445 int32x4_t righti;
11446 float32x4_t leftf;
11447 float32x4_t rightf;
11448
11449 lefti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
11450 righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
11451
11452 leftf = vmulq_f32(vcvtq_f32_s32(lefti), factor4);
11453 rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
11454
11455 drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
11456 }
11457
11458 for (i = (frameCount4 << 2); i < frameCount; ++i) {
11459 pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
11460 pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
11461 }
11462}
11463#endif
11464
11465static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
11466{
11467#if defined(DRFLAC_SUPPORT_SSE2)
11468 if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
11469 drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11470 } else
11471#elif defined(DRFLAC_SUPPORT_NEON)
11472 if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
11473 drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11474 } else
11475#endif
11476 {
11477 /* Scalar fallback. */
9e052883 11478#if 0
11479 drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
11480#else
2ff0b512 11481 drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
9e052883 11482#endif
2ff0b512 11483 }
11484}
11485
11486DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
11487{
11488 drflac_uint64 framesRead;
11489 drflac_uint32 unusedBitsPerSample;
11490
11491 if (pFlac == NULL || framesToRead == 0) {
11492 return 0;
11493 }
11494
11495 if (pBufferOut == NULL) {
11496 return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
11497 }
11498
11499 DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
11500 unusedBitsPerSample = 32 - pFlac->bitsPerSample;
11501
11502 framesRead = 0;
11503 while (framesToRead > 0) {
11504 /* If we've run out of samples in this frame, go to the next. */
11505 if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
11506 if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
11507 break; /* Couldn't read the next frame, so just break from the loop and return. */
11508 }
11509 } else {
11510 unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
11511 drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
11512 drflac_uint64 frameCountThisIteration = framesToRead;
11513
11514 if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
11515 frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
11516 }
11517
11518 if (channelCount == 2) {
11519 const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
11520 const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
11521
11522 switch (pFlac->currentFLACFrame.header.channelAssignment)
11523 {
11524 case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
11525 {
11526 drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11527 } break;
11528
11529 case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
11530 {
11531 drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11532 } break;
11533
11534 case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
11535 {
11536 drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11537 } break;
11538
11539 case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
11540 default:
11541 {
11542 drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
11543 } break;
11544 }
11545 } else {
11546 /* Generic interleaving. */
11547 drflac_uint64 i;
11548 for (i = 0; i < frameCountThisIteration; ++i) {
11549 unsigned int j;
11550 for (j = 0; j < channelCount; ++j) {
11551 drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
11552 pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
11553 }
11554 }
11555 }
11556
11557 framesRead += frameCountThisIteration;
11558 pBufferOut += frameCountThisIteration * channelCount;
11559 framesToRead -= frameCountThisIteration;
11560 pFlac->currentPCMFrame += frameCountThisIteration;
11561 pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
11562 }
11563 }
11564
11565 return framesRead;
11566}
11567
11568
11569DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
11570{
11571 if (pFlac == NULL) {
11572 return DRFLAC_FALSE;
11573 }
11574
11575 /* Don't do anything if we're already on the seek point. */
11576 if (pFlac->currentPCMFrame == pcmFrameIndex) {
11577 return DRFLAC_TRUE;
11578 }
11579
11580 /*
11581 If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
11582 when the decoder was opened.
11583 */
11584 if (pFlac->firstFLACFramePosInBytes == 0) {
11585 return DRFLAC_FALSE;
11586 }
11587
11588 if (pcmFrameIndex == 0) {
11589 pFlac->currentPCMFrame = 0;
11590 return drflac__seek_to_first_frame(pFlac);
11591 } else {
11592 drflac_bool32 wasSuccessful = DRFLAC_FALSE;
9e052883 11593 drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
2ff0b512 11594
11595 /* Clamp the sample to the end. */
11596 if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
11597 pcmFrameIndex = pFlac->totalPCMFrameCount;
11598 }
11599
11600 /* If the target sample and the current sample are in the same frame we just move the position forward. */
11601 if (pcmFrameIndex > pFlac->currentPCMFrame) {
11602 /* Forward. */
11603 drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
11604 if (pFlac->currentFLACFrame.pcmFramesRemaining > offset) {
11605 pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
11606 pFlac->currentPCMFrame = pcmFrameIndex;
11607 return DRFLAC_TRUE;
11608 }
11609 } else {
11610 /* Backward. */
11611 drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
11612 drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
11613 drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
11614 if (currentFLACFramePCMFramesConsumed > offsetAbs) {
11615 pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
11616 pFlac->currentPCMFrame = pcmFrameIndex;
11617 return DRFLAC_TRUE;
11618 }
11619 }
11620
11621 /*
11622 Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
11623 we'll instead use Ogg's natural seeking facility.
11624 */
11625#ifndef DR_FLAC_NO_OGG
11626 if (pFlac->container == drflac_container_ogg)
11627 {
11628 wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
11629 }
11630 else
11631#endif
11632 {
11633 /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
11634 if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
11635 wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
11636 }
11637
11638#if !defined(DR_FLAC_NO_CRC)
11639 /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
11640 if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
11641 wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
11642 }
11643#endif
11644
11645 /* Fall back to brute force if all else fails. */
11646 if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
11647 wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
11648 }
11649 }
11650
9e052883 11651 if (wasSuccessful) {
11652 pFlac->currentPCMFrame = pcmFrameIndex;
11653 } else {
11654 /* Seek failed. Try putting the decoder back to it's original state. */
11655 if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
11656 /* Failed to seek back to the original PCM frame. Fall back to 0. */
11657 drflac_seek_to_pcm_frame(pFlac, 0);
11658 }
11659 }
11660
2ff0b512 11661 return wasSuccessful;
11662 }
11663}
11664
11665
11666
11667/* High Level APIs */
11668
11669#if defined(SIZE_MAX)
11670 #define DRFLAC_SIZE_MAX SIZE_MAX
11671#else
11672 #if defined(DRFLAC_64BIT)
11673 #define DRFLAC_SIZE_MAX ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
11674 #else
11675 #define DRFLAC_SIZE_MAX 0xFFFFFFFF
11676 #endif
11677#endif
11678
11679
11680/* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
11681#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
11682static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
11683{ \
11684 type* pSampleData = NULL; \
11685 drflac_uint64 totalPCMFrameCount; \
11686 \
11687 DRFLAC_ASSERT(pFlac != NULL); \
11688 \
11689 totalPCMFrameCount = pFlac->totalPCMFrameCount; \
11690 \
11691 if (totalPCMFrameCount == 0) { \
11692 type buffer[4096]; \
11693 drflac_uint64 pcmFramesRead; \
11694 size_t sampleDataBufferSize = sizeof(buffer); \
11695 \
11696 pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks); \
11697 if (pSampleData == NULL) { \
11698 goto on_error; \
11699 } \
11700 \
11701 while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) { \
11702 if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) { \
11703 type* pNewSampleData; \
11704 size_t newSampleDataBufferSize; \
11705 \
11706 newSampleDataBufferSize = sampleDataBufferSize * 2; \
11707 pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks); \
11708 if (pNewSampleData == NULL) { \
11709 drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks); \
11710 goto on_error; \
11711 } \
11712 \
11713 sampleDataBufferSize = newSampleDataBufferSize; \
11714 pSampleData = pNewSampleData; \
11715 } \
11716 \
11717 DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type))); \
11718 totalPCMFrameCount += pcmFramesRead; \
11719 } \
11720 \
11721 /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to \
11722 protect those ears from random noise! */ \
11723 DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type))); \
11724 } else { \
11725 drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type); \
9e052883 11726 if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) { \
2ff0b512 11727 goto on_error; /* The decoded data is too big. */ \
11728 } \
11729 \
11730 pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks); /* <-- Safe cast as per the check above. */ \
11731 if (pSampleData == NULL) { \
11732 goto on_error; \
11733 } \
11734 \
11735 totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData); \
11736 } \
11737 \
11738 if (sampleRateOut) *sampleRateOut = pFlac->sampleRate; \
11739 if (channelsOut) *channelsOut = pFlac->channels; \
11740 if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount; \
11741 \
11742 drflac_close(pFlac); \
11743 return pSampleData; \
11744 \
11745on_error: \
11746 drflac_close(pFlac); \
11747 return NULL; \
11748}
11749
11750DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
11751DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
11752DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
11753
11754DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11755{
11756 drflac* pFlac;
11757
11758 if (channelsOut) {
11759 *channelsOut = 0;
11760 }
11761 if (sampleRateOut) {
11762 *sampleRateOut = 0;
11763 }
11764 if (totalPCMFrameCountOut) {
11765 *totalPCMFrameCountOut = 0;
11766 }
11767
11768 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11769 if (pFlac == NULL) {
11770 return NULL;
11771 }
11772
11773 return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11774}
11775
11776DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11777{
11778 drflac* pFlac;
11779
11780 if (channelsOut) {
11781 *channelsOut = 0;
11782 }
11783 if (sampleRateOut) {
11784 *sampleRateOut = 0;
11785 }
11786 if (totalPCMFrameCountOut) {
11787 *totalPCMFrameCountOut = 0;
11788 }
11789
11790 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11791 if (pFlac == NULL) {
11792 return NULL;
11793 }
11794
11795 return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11796}
11797
11798DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
11799{
11800 drflac* pFlac;
11801
11802 if (channelsOut) {
11803 *channelsOut = 0;
11804 }
11805 if (sampleRateOut) {
11806 *sampleRateOut = 0;
11807 }
11808 if (totalPCMFrameCountOut) {
11809 *totalPCMFrameCountOut = 0;
11810 }
11811
11812 pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
11813 if (pFlac == NULL) {
11814 return NULL;
11815 }
11816
11817 return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
11818}
11819
9e052883 11820#ifndef DR_FLAC_NO_STDIO
11821DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11822{
11823 drflac* pFlac;
11824
11825 if (sampleRate) {
11826 *sampleRate = 0;
11827 }
11828 if (channels) {
11829 *channels = 0;
11830 }
11831 if (totalPCMFrameCount) {
11832 *totalPCMFrameCount = 0;
11833 }
11834
11835 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11836 if (pFlac == NULL) {
11837 return NULL;
11838 }
11839
11840 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11841}
11842
11843DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11844{
11845 drflac* pFlac;
11846
11847 if (sampleRate) {
11848 *sampleRate = 0;
11849 }
11850 if (channels) {
11851 *channels = 0;
11852 }
11853 if (totalPCMFrameCount) {
11854 *totalPCMFrameCount = 0;
11855 }
11856
11857 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11858 if (pFlac == NULL) {
11859 return NULL;
11860 }
11861
11862 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11863}
11864
11865DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11866{
11867 drflac* pFlac;
11868
11869 if (sampleRate) {
11870 *sampleRate = 0;
11871 }
11872 if (channels) {
11873 *channels = 0;
11874 }
11875 if (totalPCMFrameCount) {
11876 *totalPCMFrameCount = 0;
11877 }
11878
11879 pFlac = drflac_open_file(filename, pAllocationCallbacks);
11880 if (pFlac == NULL) {
11881 return NULL;
11882 }
11883
11884 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11885}
11886#endif
11887
2ff0b512 11888DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11889{
11890 drflac* pFlac;
11891
11892 if (sampleRate) {
11893 *sampleRate = 0;
11894 }
11895 if (channels) {
11896 *channels = 0;
11897 }
11898 if (totalPCMFrameCount) {
11899 *totalPCMFrameCount = 0;
11900 }
11901
11902 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11903 if (pFlac == NULL) {
11904 return NULL;
11905 }
11906
11907 return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
11908}
11909
11910DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11911{
11912 drflac* pFlac;
11913
11914 if (sampleRate) {
11915 *sampleRate = 0;
11916 }
11917 if (channels) {
11918 *channels = 0;
11919 }
11920 if (totalPCMFrameCount) {
11921 *totalPCMFrameCount = 0;
11922 }
11923
11924 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11925 if (pFlac == NULL) {
11926 return NULL;
11927 }
11928
11929 return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
11930}
11931
11932DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
11933{
11934 drflac* pFlac;
11935
11936 if (sampleRate) {
11937 *sampleRate = 0;
11938 }
11939 if (channels) {
11940 *channels = 0;
11941 }
11942 if (totalPCMFrameCount) {
11943 *totalPCMFrameCount = 0;
11944 }
11945
11946 pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
11947 if (pFlac == NULL) {
11948 return NULL;
11949 }
11950
11951 return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
11952}
11953
11954
11955DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
11956{
11957 if (pAllocationCallbacks != NULL) {
11958 drflac__free_from_callbacks(p, pAllocationCallbacks);
11959 } else {
11960 drflac__free_default(p, NULL);
11961 }
11962}
11963
11964
11965
11966
11967DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
11968{
11969 if (pIter == NULL) {
11970 return;
11971 }
11972
11973 pIter->countRemaining = commentCount;
11974 pIter->pRunningData = (const char*)pComments;
11975}
11976
11977DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
11978{
11979 drflac_int32 length;
11980 const char* pComment;
11981
11982 /* Safety. */
11983 if (pCommentLengthOut) {
11984 *pCommentLengthOut = 0;
11985 }
11986
11987 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
11988 return NULL;
11989 }
11990
9e052883 11991 length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData);
2ff0b512 11992 pIter->pRunningData += 4;
11993
11994 pComment = pIter->pRunningData;
11995 pIter->pRunningData += length;
11996 pIter->countRemaining -= 1;
11997
11998 if (pCommentLengthOut) {
11999 *pCommentLengthOut = length;
12000 }
12001
12002 return pComment;
12003}
12004
12005
12006
12007
12008DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
12009{
12010 if (pIter == NULL) {
12011 return;
12012 }
12013
12014 pIter->countRemaining = trackCount;
12015 pIter->pRunningData = (const char*)pTrackData;
12016}
12017
12018DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
12019{
12020 drflac_cuesheet_track cuesheetTrack;
12021 const char* pRunningData;
12022 drflac_uint64 offsetHi;
12023 drflac_uint64 offsetLo;
12024
12025 if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
12026 return DRFLAC_FALSE;
12027 }
12028
12029 pRunningData = pIter->pRunningData;
12030
12031 offsetHi = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12032 offsetLo = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
12033 cuesheetTrack.offset = offsetLo | (offsetHi << 32);
12034 cuesheetTrack.trackNumber = pRunningData[0]; pRunningData += 1;
12035 DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC)); pRunningData += 12;
12036 cuesheetTrack.isAudio = (pRunningData[0] & 0x80) != 0;
12037 cuesheetTrack.preEmphasis = (pRunningData[0] & 0x40) != 0; pRunningData += 14;
12038 cuesheetTrack.indexCount = pRunningData[0]; pRunningData += 1;
12039 cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData; pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
12040
12041 pIter->pRunningData = pRunningData;
12042 pIter->countRemaining -= 1;
12043
12044 if (pCuesheetTrack) {
12045 *pCuesheetTrack = cuesheetTrack;
12046 }
12047
12048 return DRFLAC_TRUE;
12049}
12050
12051#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
12052 #pragma GCC diagnostic pop
12053#endif
12054#endif /* dr_flac_c */
12055#endif /* DR_FLAC_IMPLEMENTATION */
12056
12057
12058/*
12059REVISION HISTORY
12060================
9e052883 12061v0.12.39 - 2022-09-17
12062 - Fix compilation with DJGPP.
12063 - Fix compilation error with Visual Studio 2019 and the ARM build.
12064 - Fix an error with SSE 4.1 detection.
12065 - Add support for disabling wchar_t with DR_WAV_NO_WCHAR.
12066 - Improve compatibility with compilers which lack support for explicit struct packing.
12067 - Improve compatibility with low-end and embedded hardware by reducing the amount of stack
12068 allocation when loading an Ogg encapsulated file.
12069
12070v0.12.38 - 2022-04-10
12071 - Fix compilation error on older versions of GCC.
12072
12073v0.12.37 - 2022-02-12
12074 - Improve ARM detection.
12075
12076v0.12.36 - 2022-02-07
12077 - Fix a compilation error with the ARM build.
12078
12079v0.12.35 - 2022-02-06
12080 - Fix a bug due to underestimating the amount of precision required for the prediction stage.
12081 - Fix some bugs found from fuzz testing.
12082
12083v0.12.34 - 2022-01-07
12084 - Fix some misalignment bugs when reading metadata.
12085
12086v0.12.33 - 2021-12-22
12087 - Fix a bug with seeking when the seek table does not start at PCM frame 0.
12088
12089v0.12.32 - 2021-12-11
12090 - Fix a warning with Clang.
12091
12092v0.12.31 - 2021-08-16
12093 - Silence some warnings.
12094
12095v0.12.30 - 2021-07-31
12096 - Fix platform detection for ARM64.
12097
12098v0.12.29 - 2021-04-02
12099 - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
12100 - Fix a decoding error due to an incorrect validation check.
12101
2ff0b512 12102v0.12.28 - 2021-02-21
12103 - Fix a warning due to referencing _MSC_VER when it is undefined.
12104
12105v0.12.27 - 2021-01-31
12106 - Fix a static analysis warning.
12107
12108v0.12.26 - 2021-01-17
12109 - Fix a compilation warning due to _BSD_SOURCE being deprecated.
12110
12111v0.12.25 - 2020-12-26
12112 - Update documentation.
12113
12114v0.12.24 - 2020-11-29
12115 - Fix ARM64/NEON detection when compiling with MSVC.
12116
12117v0.12.23 - 2020-11-21
12118 - Fix compilation with OpenWatcom.
12119
12120v0.12.22 - 2020-11-01
12121 - Fix an error with the previous release.
12122
12123v0.12.21 - 2020-11-01
12124 - Fix a possible deadlock when seeking.
12125 - Improve compiler support for older versions of GCC.
12126
12127v0.12.20 - 2020-09-08
12128 - Fix a compilation error on older compilers.
12129
12130v0.12.19 - 2020-08-30
12131 - Fix a bug due to an undefined 32-bit shift.
12132
12133v0.12.18 - 2020-08-14
12134 - Fix a crash when compiling with clang-cl.
12135
12136v0.12.17 - 2020-08-02
12137 - Simplify sized types.
12138
12139v0.12.16 - 2020-07-25
12140 - Fix a compilation warning.
12141
12142v0.12.15 - 2020-07-06
12143 - Check for negative LPC shifts and return an error.
12144
12145v0.12.14 - 2020-06-23
12146 - Add include guard for the implementation section.
12147
12148v0.12.13 - 2020-05-16
12149 - Add compile-time and run-time version querying.
12150 - DRFLAC_VERSION_MINOR
12151 - DRFLAC_VERSION_MAJOR
12152 - DRFLAC_VERSION_REVISION
12153 - DRFLAC_VERSION_STRING
12154 - drflac_version()
12155 - drflac_version_string()
12156
12157v0.12.12 - 2020-04-30
12158 - Fix compilation errors with VC6.
12159
12160v0.12.11 - 2020-04-19
12161 - Fix some pedantic warnings.
12162 - Fix some undefined behaviour warnings.
12163
12164v0.12.10 - 2020-04-10
12165 - Fix some bugs when trying to seek with an invalid seek table.
12166
12167v0.12.9 - 2020-04-05
12168 - Fix warnings.
12169
12170v0.12.8 - 2020-04-04
12171 - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
12172 - Fix some static analysis warnings.
12173 - Minor documentation updates.
12174
12175v0.12.7 - 2020-03-14
12176 - Fix compilation errors with VC6.
12177
12178v0.12.6 - 2020-03-07
12179 - Fix compilation error with Visual Studio .NET 2003.
12180
12181v0.12.5 - 2020-01-30
12182 - Silence some static analysis warnings.
12183
12184v0.12.4 - 2020-01-29
12185 - Silence some static analysis warnings.
12186
12187v0.12.3 - 2019-12-02
12188 - Fix some warnings when compiling with GCC and the -Og flag.
12189 - Fix a crash in out-of-memory situations.
12190 - Fix potential integer overflow bug.
12191 - Fix some static analysis warnings.
12192 - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
12193 - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
12194
12195v0.12.2 - 2019-10-07
12196 - Internal code clean up.
12197
12198v0.12.1 - 2019-09-29
12199 - Fix some Clang Static Analyzer warnings.
12200 - Fix an unused variable warning.
12201
12202v0.12.0 - 2019-09-23
12203 - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
12204 routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
12205 - drflac_open()
12206 - drflac_open_relaxed()
12207 - drflac_open_with_metadata()
12208 - drflac_open_with_metadata_relaxed()
12209 - drflac_open_file()
12210 - drflac_open_file_with_metadata()
12211 - drflac_open_memory()
12212 - drflac_open_memory_with_metadata()
12213 - drflac_open_and_read_pcm_frames_s32()
12214 - drflac_open_and_read_pcm_frames_s16()
12215 - drflac_open_and_read_pcm_frames_f32()
12216 - drflac_open_file_and_read_pcm_frames_s32()
12217 - drflac_open_file_and_read_pcm_frames_s16()
12218 - drflac_open_file_and_read_pcm_frames_f32()
12219 - drflac_open_memory_and_read_pcm_frames_s32()
12220 - drflac_open_memory_and_read_pcm_frames_s16()
12221 - drflac_open_memory_and_read_pcm_frames_f32()
12222 Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
12223 DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
12224 - Remove deprecated APIs:
12225 - drflac_read_s32()
12226 - drflac_read_s16()
12227 - drflac_read_f32()
12228 - drflac_seek_to_sample()
12229 - drflac_open_and_decode_s32()
12230 - drflac_open_and_decode_s16()
12231 - drflac_open_and_decode_f32()
12232 - drflac_open_and_decode_file_s32()
12233 - drflac_open_and_decode_file_s16()
12234 - drflac_open_and_decode_file_f32()
12235 - drflac_open_and_decode_memory_s32()
12236 - drflac_open_and_decode_memory_s16()
12237 - drflac_open_and_decode_memory_f32()
12238 - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
12239 by doing pFlac->totalPCMFrameCount*pFlac->channels.
12240 - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
12241 - Fix errors when seeking to the end of a stream.
12242 - Optimizations to seeking.
12243 - SSE improvements and optimizations.
12244 - ARM NEON optimizations.
12245 - Optimizations to drflac_read_pcm_frames_s16().
12246 - Optimizations to drflac_read_pcm_frames_s32().
12247
12248v0.11.10 - 2019-06-26
12249 - Fix a compiler error.
12250
12251v0.11.9 - 2019-06-16
12252 - Silence some ThreadSanitizer warnings.
12253
12254v0.11.8 - 2019-05-21
12255 - Fix warnings.
12256
12257v0.11.7 - 2019-05-06
12258 - C89 fixes.
12259
12260v0.11.6 - 2019-05-05
12261 - Add support for C89.
12262 - Fix a compiler warning when CRC is disabled.
12263 - Change license to choice of public domain or MIT-0.
12264
12265v0.11.5 - 2019-04-19
12266 - Fix a compiler error with GCC.
12267
12268v0.11.4 - 2019-04-17
12269 - Fix some warnings with GCC when compiling with -std=c99.
12270
12271v0.11.3 - 2019-04-07
12272 - Silence warnings with GCC.
12273
12274v0.11.2 - 2019-03-10
12275 - Fix a warning.
12276
12277v0.11.1 - 2019-02-17
12278 - Fix a potential bug with seeking.
12279
12280v0.11.0 - 2018-12-16
12281 - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
12282 drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
12283 and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
12284 dividing it by the channel count, and then do the same with the return value.
12285 - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
12286 the changes to drflac_read_*() apply.
12287 - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
12288 the changes to drflac_read_*() apply.
12289 - Optimizations.
12290
12291v0.10.0 - 2018-09-11
12292 - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
12293 need to do it yourself via the callback API.
12294 - Fix the clang build.
12295 - Fix undefined behavior.
12296 - Fix errors with CUESHEET metdata blocks.
12297 - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
12298 Vorbis comment API.
12299 - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
12300 - Minor optimizations.
12301
12302v0.9.11 - 2018-08-29
12303 - Fix a bug with sample reconstruction.
12304
12305v0.9.10 - 2018-08-07
12306 - Improve 64-bit detection.
12307
12308v0.9.9 - 2018-08-05
12309 - Fix C++ build on older versions of GCC.
12310
12311v0.9.8 - 2018-07-24
12312 - Fix compilation errors.
12313
12314v0.9.7 - 2018-07-05
12315 - Fix a warning.
12316
12317v0.9.6 - 2018-06-29
12318 - Fix some typos.
12319
12320v0.9.5 - 2018-06-23
12321 - Fix some warnings.
12322
12323v0.9.4 - 2018-06-14
12324 - Optimizations to seeking.
12325 - Clean up.
12326
12327v0.9.3 - 2018-05-22
12328 - Bug fix.
12329
12330v0.9.2 - 2018-05-12
12331 - Fix a compilation error due to a missing break statement.
12332
12333v0.9.1 - 2018-04-29
12334 - Fix compilation error with Clang.
12335
12336v0.9 - 2018-04-24
12337 - Fix Clang build.
12338 - Start using major.minor.revision versioning.
12339
12340v0.8g - 2018-04-19
12341 - Fix build on non-x86/x64 architectures.
12342
12343v0.8f - 2018-02-02
12344 - Stop pretending to support changing rate/channels mid stream.
12345
12346v0.8e - 2018-02-01
12347 - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
12348 - Fix a crash the the Rice partition order is invalid.
12349
12350v0.8d - 2017-09-22
12351 - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
12352
12353v0.8c - 2017-09-07
12354 - Fix warning on non-x86/x64 architectures.
12355
12356v0.8b - 2017-08-19
12357 - Fix build on non-x86/x64 architectures.
12358
12359v0.8a - 2017-08-13
12360 - A small optimization for the Clang build.
12361
12362v0.8 - 2017-08-12
12363 - API CHANGE: Rename dr_* types to drflac_*.
12364 - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
12365 - Add support for custom implementations of malloc(), realloc(), etc.
12366 - Add CRC checking to Ogg encapsulated streams.
12367 - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
12368 - Bug fixes.
12369
12370v0.7 - 2017-07-23
12371 - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
12372
12373v0.6 - 2017-07-22
12374 - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
12375 never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
12376
12377v0.5 - 2017-07-16
12378 - Fix typos.
12379 - Change drflac_bool* types to unsigned.
12380 - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
12381
12382v0.4f - 2017-03-10
12383 - Fix a couple of bugs with the bitstreaming code.
12384
12385v0.4e - 2017-02-17
12386 - Fix some warnings.
12387
12388v0.4d - 2016-12-26
12389 - Add support for 32-bit floating-point PCM decoding.
12390 - Use drflac_int* and drflac_uint* sized types to improve compiler support.
12391 - Minor improvements to documentation.
12392
12393v0.4c - 2016-12-26
12394 - Add support for signed 16-bit integer PCM decoding.
12395
12396v0.4b - 2016-10-23
12397 - A minor change to drflac_bool8 and drflac_bool32 types.
12398
12399v0.4a - 2016-10-11
12400 - Rename drBool32 to drflac_bool32 for styling consistency.
12401
12402v0.4 - 2016-09-29
12403 - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
12404 - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
12405 - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
12406 keep it consistent with drflac_audio.
12407
12408v0.3f - 2016-09-21
12409 - Fix a warning with GCC.
12410
12411v0.3e - 2016-09-18
12412 - Fixed a bug where GCC 4.3+ was not getting properly identified.
12413 - Fixed a few typos.
12414 - Changed date formats to ISO 8601 (YYYY-MM-DD).
12415
12416v0.3d - 2016-06-11
12417 - Minor clean up.
12418
12419v0.3c - 2016-05-28
12420 - Fixed compilation error.
12421
12422v0.3b - 2016-05-16
12423 - Fixed Linux/GCC build.
12424 - Updated documentation.
12425
12426v0.3a - 2016-05-15
12427 - Minor fixes to documentation.
12428
12429v0.3 - 2016-05-11
12430 - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
12431 - Lots of clean up.
12432
12433v0.2b - 2016-05-10
12434 - Bug fixes.
12435
12436v0.2a - 2016-05-10
12437 - Made drflac_open_and_decode() more robust.
12438 - Removed an unused debugging variable
12439
12440v0.2 - 2016-05-09
12441 - Added support for Ogg encapsulation.
12442 - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
12443 should be relative to the start or the current position. Also changes the seeking rules such that
12444 seeking offsets will never be negative.
12445 - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
12446
12447v0.1b - 2016-05-07
12448 - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
12449 - Removed a stale comment.
12450
12451v0.1a - 2016-05-05
12452 - Minor formatting changes.
12453 - Fixed a warning on the GCC build.
12454
12455v0.1 - 2016-05-03
12456 - Initial versioned release.
12457*/
12458
12459/*
12460This software is available as a choice of the following licenses. Choose
12461whichever you prefer.
12462
12463===============================================================================
12464ALTERNATIVE 1 - Public Domain (www.unlicense.org)
12465===============================================================================
12466This is free and unencumbered software released into the public domain.
12467
12468Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
12469software, either in source code form or as a compiled binary, for any purpose,
12470commercial or non-commercial, and by any means.
12471
12472In jurisdictions that recognize copyright laws, the author or authors of this
12473software dedicate any and all copyright interest in the software to the public
12474domain. We make this dedication for the benefit of the public at large and to
12475the detriment of our heirs and successors. We intend this dedication to be an
12476overt act of relinquishment in perpetuity of all present and future rights to
12477this software under copyright law.
12478
12479THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12480IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12481FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12482AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
12483ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
12484WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12485
12486For more information, please refer to <http://unlicense.org/>
12487
12488===============================================================================
12489ALTERNATIVE 2 - MIT No Attribution
12490===============================================================================
12491Copyright 2020 David Reid
12492
12493Permission is hereby granted, free of charge, to any person obtaining a copy of
12494this software and associated documentation files (the "Software"), to deal in
12495the Software without restriction, including without limitation the rights to
12496use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
12497of the Software, and to permit persons to whom the Software is furnished to do
12498so.
12499
12500THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12501IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12502FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
12503AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
12504LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
12505OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
12506SOFTWARE.
12507*/